Information Technology Reference
In-Depth Information
6. Over-sampling + AdaBoost. First random over-sampling is used such
that the minority class has the same number of examples as the majority
class, and then an AdaBoost ensemble is trained from the new dataset.
7. Over-sampling + RF. First random over-sampling is used such that the
minority class has the same number of examples as the majority class,
and then a RF ensemble is trained from the new dataset.
8. SMOTE + under-sampling + AdaBoost. First SMOTE is used to gener-
ate n + synthetic examples for the minority class, and then under-sampling
is conducted to make the majority class to have 2 n + examples; finally, an
AdaBoost ensemble is trained from the new dataset.
9. Chan + AdaBoost. Chan is a bagging-style ensemble method. First
Chan is used to generate independent balanced data samples, and then an
AdaBoost ensemble is trained from each sample.
10. BRF. This is a bagging-style ensemble method for CIL.
11. AsymBoost, with CART as base learning algorithm. AsymBoost is a cost-
sensitive boosting method. Let the costs of the positive examples be the
level of imbalance, that is, r
=
n /n + , and the costs of the negative
examples be 1.
12. SMOTEBoost, with CART as base learning algorithm. SMOTEBoost is
a boosting method for CIL. The k -nearest neighbor parameter of SMOTE
is 5. The amount of new data generated using SMOTE in each iteration
is n + .
13. EasyEnsemble, with CART as base learning algorithm for AdaBoost.
This is a hybrid ensemble method for CIL.
14. BalanceCascade, with CART as base learning algorithm for AdaBoost.
This is a hybrid ensemble method for CIL.
For fair comparison, all methods are ensemble methods and they use AdaBoost
or CART as base learning algorithm. In addition, the number of all CART base
learners trained by these methods (except Chan) is set to 40. As for Chan, there
are n /n + bags. AdaBoost classifiers are trained for 40 n + /n iterations
when n /n + < 40; otherwise, only one iteration is allowed. Thus, the number
of all CART base learners generated is around 40. The abbreviation and type
information for these methods are summarized in Table 4.2, where bold face
indicates the method is an ensemble method for CIL. These methods are cate-
gorized into three groups: standard ensemble methods (Group 1), CIL methods
that do not use ensemble methods to handle imbalanced data (Group 2), and
ensemble methods for CIL (Group 3) 4 . Note that although ensemble learning is
invoked by the methods in Group 2, it is not a part of CIL methods to handle
imbalanced data, which is totally different from the methods in Group 3.
4 We will use “methods in Group 1,” “methods in Group 2,” and “methods in Group 3” to indicate
the above three groups of methods for convenience.
Search WWH ::




Custom Search