ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Table 4.2 The Methods in Comparison

Methods

Abbr.

Type

1

Bagging

Standard ensemble method

Bagg

Group 1

2

AdaBoost

Standard ensemble method

Ada

3

Random Forests

RF

Standard ensemble method

4

Under-sampling

Under-Ada

CIL method

+ AdaBoost

5

Under-sampling

Under-RF

CIL method

+ Random Forests

Group 2

6

Over-sampling

Over-Ada

CIL method

+

AdaBoost

7

Over-sampling

CIL method

Over-RF

+

Random Forests

8

S OTE

CIL method

SMOTE

+

under-sampling

+

AdaBoost

9

Chan

+

AdaBoost

Bagging-style method for CIL

Chan

10

Balanced

Bagging-style method for CIL

BRF

Random Forests

Group 3

11

AsymBoost

Boosting-based method for CIL

Asym

12

SMOTEBoost

SMB

Boosting-based method for CIL

13

EasyEnsemble

Easy

Hybrid ensemble method for CIL

14

BalanceCascade

Cascade

Hybrid ensemble method for CIL

Table 4.3 summarizes the results of area under the curve (AUC) values [40]

by conducting 10 times 10-fold cross-validation. AUC is a popular performance

measure in CIL. The higher the AUC value, the better the performance. Figures

4.1 and 4.2 show the scatter plots of methods in Groups 1 and 2 versus each

of the six ensemble methods for CIL in Group 3, respectively. A point above

the dotted line indicates that the method of y -axes is better than that of the

x -axes. In addition, Table 4.4 gives the detailed win-tie-lose counts of each

class imbalance method (Groups 2 and 3) versus standard methods (Group 1),

and each ensemble method for CIL (Group 3) versus methods in Group 2, via

t -tests with significance level at 0.05. Bold face indicates the result is significant

in sign test at significance level at 0.05.

Almost all the CIL methods have significantly better performance than CART

[6] 5 . But when compared with standard ensemble methods (Group 1), under-

sampling and over-sampling methods in Group 2 are not very effective. It is

probably because standard ensemble methods have strong generalization ability;

this could reduce the effect of class imbalance. While SMOTE is significantly

better than the standard ensemble methods. This suggests that SMOTE sampling

and combination of different sampling methods are good choices to handle imbal-

anced data when invoking ensemble learning algorithm to generate a classifier.

5 Since CART cannot produce AUC values, we did not include CART in comparison list in this

chapter. Liu et al. [6] showed that almost all the CIL methods have significantly higher F -measure

and G -mean values than CART.

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home