Information Technology Reference
In-Depth Information
4.3.1 Bagging-Style Methods
Bagging-style CIL methods use diverse training samples (we call them bags
for convenience) to train independent base learners in parallel, which is like
Bagging. But the difference is that, Bagging-style CIL methods try to construct a
balanced data sample to train a base learner capable to handle imbalanced data in
each iteration, while Bagging uses a bootstrap sample, whose data distribution is
identical to the underlying data distribution
, to train a base learner maximizing
the accuracy. Different bag construction strategies lead to different Bagging-style
methods. Some methods use under-sampling methods to reduce the majority
class examples in a bag, such as UnderBagging [20]; some methods use over-
sampling methods to increase the minority class examples, such as OverBagging
[20] and SMOTEBagging [20]; some methods use the hybrid of under- and/or
over-sampling methods to balance the data, such as SMOTEBagging [20]. While
some other methods partition the majority class into a set of disjoint subsets
of size n + , with each subset together with all the minority class examples to
construct a bag, such as Chan and Stolof's method [5].
D
4.3.1.1 UnderBagging/OverBagging Both UnderBagging and OverBagging
construct a bag by obtaining two samples of the same size n ·
a % from
the minority class and the majority class separately by sampling with
replacement, where a % varies from 10% to 100% [20]. When a %
n + /n ,
under-sampling is conducted to remove the majority class examples, which leads
to UnderBagging; when a % = 100%, over-sampling is conduced to increase
the minority examples, which leads to OverBagging; otherwise, both under- and
over-sampling are conduced, which leads to a hybrid Bagging-style ensemble.
The base learners are combined by majority voting.
=
4.3.1.2 SMOTEBagging SMOTEBagging is similar to OverBagging, a sample
of n minority class examples are sampled [20]. The difference lies in how
the sample of minority examples is obtained. SMOTEBagging samples n ·
b %
minority class examples from the minority class and then generates n ·
( 1
b % )
synthetic minority class examples by SMOTE [21].
4.3.1.3 Chan and Stolof's Method Chan and Stolof's method (Chan) partitions
the majority class into a set of nonoverlapping subsets, with each subset having
approximately n + examples [5]. Each of the majority class subset and all the
minority class examples form a bag. The base learners are ensembled by stacking.
4.3.1.4 Balanced Random Forests (BRF) [22] Balanced Random Forests
(BRF) adapts Random Forests (RF) [19] to imbalanced data. To learn a single
tree in each iteration, it first draws a bootstrap sample of size n + from the minor-
ity class, and then draws the same number of examples with replacement from
the majority class. Thus, each training sample is a balanced data sample. Then,
BRF induces a tree from each of the balanced training sample with maximum
Search WWH ::




Custom Search