Information Technology Reference
In-Depth Information
Table 4.2 The Methods in Comparison
Methods
Abbr.
Type
1
Bagging
Standard ensemble method
Bagg
Group 1
2
AdaBoost
Standard ensemble method
Ada
3
Random Forests
RF
Standard ensemble method
4
Under-sampling
Under-Ada
CIL method
+ AdaBoost
5
Under-sampling
Under-RF
CIL method
+ Random Forests
Group 2
6
Over-sampling
Over-Ada
CIL method
+
AdaBoost
7
Over-sampling
CIL method
Over-RF
+
Random Forests
8
S OTE
CIL method
SMOTE
+
under-sampling
+
AdaBoost
9
Chan
+
AdaBoost
Bagging-style method for CIL
Chan
10
Balanced
Bagging-style method for CIL
BRF
Random Forests
Group 3
11
AsymBoost
Boosting-based method for CIL
Asym
12
SMOTEBoost
SMB
Boosting-based method for CIL
13
EasyEnsemble
Easy
Hybrid ensemble method for CIL
14
BalanceCascade
Cascade
Hybrid ensemble method for CIL
Table 4.3 summarizes the results of area under the curve (AUC) values [40]
by conducting 10 times 10-fold cross-validation. AUC is a popular performance
measure in CIL. The higher the AUC value, the better the performance. Figures
4.1 and 4.2 show the scatter plots of methods in Groups 1 and 2 versus each
of the six ensemble methods for CIL in Group 3, respectively. A point above
the dotted line indicates that the method of y -axes is better than that of the
x -axes. In addition, Table 4.4 gives the detailed win-tie-lose counts of each
class imbalance method (Groups 2 and 3) versus standard methods (Group 1),
and each ensemble method for CIL (Group 3) versus methods in Group 2, via
t -tests with significance level at 0.05. Bold face indicates the result is significant
in sign test at significance level at 0.05.
Almost all the CIL methods have significantly better performance than CART
[6] 5 . But when compared with standard ensemble methods (Group 1), under-
sampling and over-sampling methods in Group 2 are not very effective. It is
probably because standard ensemble methods have strong generalization ability;
this could reduce the effect of class imbalance. While SMOTE is significantly
better than the standard ensemble methods. This suggests that SMOTE sampling
and combination of different sampling methods are good choices to handle imbal-
anced data when invoking ensemble learning algorithm to generate a classifier.
5 Since CART cannot produce AUC values, we did not include CART in comparison list in this
chapter. Liu et al. [6] showed that almost all the CIL methods have significantly higher F -measure
and G -mean values than CART.
Search WWH ::




Custom Search