ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

4.3.1 Bagging-Style Methods

Bagging-style CIL methods use diverse training samples (we call them bags

for convenience) to train independent base learners in parallel, which is like

Bagging. But the difference is that, Bagging-style CIL methods try to construct a

balanced data sample to train a base learner capable to handle imbalanced data in

each iteration, while Bagging uses a bootstrap sample, whose data distribution is

identical to the underlying data distribution

, to train a base learner maximizing

the accuracy. Different bag construction strategies lead to different Bagging-style

methods. Some methods use under-sampling methods to reduce the majority

class examples in a bag, such as UnderBagging [20]; some methods use over-

sampling methods to increase the minority class examples, such as OverBagging

[20] and SMOTEBagging [20]; some methods use the hybrid of under- and/or

over-sampling methods to balance the data, such as SMOTEBagging [20]. While

some other methods partition the majority class into a set of disjoint subsets

of size n + , with each subset together with all the minority class examples to

construct a bag, such as Chan and Stolof's method [5].

D

4.3.1.1 UnderBagging/OverBagging Both UnderBagging and OverBagging

construct a bag by obtaining two samples of the same size n − ·

a % from

the minority class and the majority class separately by sampling with

replacement, where a % varies from 10% to 100% [20]. When a %

n + /n − ,

under-sampling is conducted to remove the majority class examples, which leads

to UnderBagging; when a % = 100%, over-sampling is conduced to increase

the minority examples, which leads to OverBagging; otherwise, both under- and

over-sampling are conduced, which leads to a hybrid Bagging-style ensemble.

The base learners are combined by majority voting.

=

4.3.1.2 SMOTEBagging SMOTEBagging is similar to OverBagging, a sample

of n − minority class examples are sampled [20]. The difference lies in how

the sample of minority examples is obtained. SMOTEBagging samples n − ·

b %

minority class examples from the minority class and then generates n − ·

( 1

−

b % )

synthetic minority class examples by SMOTE [21].

4.3.1.3 Chan and Stolof's Method Chan and Stolof's method (Chan) partitions

the majority class into a set of nonoverlapping subsets, with each subset having

approximately n + examples [5]. Each of the majority class subset and all the

minority class examples form a bag. The base learners are ensembled by stacking.

4.3.1.4 Balanced Random Forests (BRF) [22] Balanced Random Forests

(BRF) adapts Random Forests (RF) [19] to imbalanced data. To learn a single

tree in each iteration, it first draws a bootstrap sample of size n + from the minor-

ity class, and then draws the same number of examples with replacement from

the majority class. Thus, each training sample is a balanced data sample. Then,

BRF induces a tree from each of the balanced training sample with maximum

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home