Information Technology Reference
In-Depth Information
4.4 EMPIRICAL STUDY
Many research works reported the effectiveness of ensemble methods to handle
imbalanced data. To illustrate the advantages of ensemble methods for CIL, we
compare some typical ones with standard methods and some CIL methods without
ensemble techniques. 3
The datasets are 10 binary class-imbalanced University of California, Irvine
(UCI) datasets [39] whose information is summarized in Table 4.1. The methods
in comparison are:
1. Bagging , with CART as base learning algorithm.
2. AdaBoost , with CART as base learning algorithm.
3. RF [19]. RF is a state-of-the-art ensemble method. It injects random-
ness into base learning algorithm instead of training data. Specifically, RF
trains random decision trees as base learners by random feature selection.
When constructing a component decision tree, at each step of split selec-
tion, RF first selects a feature subset randomly, and then carries out the
conventional split selection procedure within the selected feature subset.
4. Under-sampling + AdaBoost. First random under-sampling is used such
that the majority class has the same number of examples as the minority
class, and then an AdaBoost ensemble is trained from the new dataset.
5. Under-sampling + RF. First random under-sampling is used such that
the majority class has the same number of examples as the minority class,
and then a RF ensemble is trained from the new dataset.
Table 4.1 Basic Information of Datasets a
Dataset
Size
Attribute
Target
#min/#maj
Ratio
abalone
4177
1N,7C
Ring = 7
391/3786
9.7
balance
625
4C
Balance
49/576
11.8
cmc
1473
3B,4N,2C
class 2
333/1140
3.4
aberman
306
1N,2C
class 2
81/225
2.8
housing
506
1B,12C
[20, 23]
106/400
3.8
mf-morph
2000
6C
class 10
200/1800
9.0
mf-zernike
2000
47C
class 10
200/1800
9.0
pima
768
8C
class 1
268/500
1.9
vehicle
846
18C
opel
212/634
3.0
wpbc
198
33C
recur
47/151
3.2
a Size is the number of examples. Targe t is used as the minority class, and all others are used
as the majority class. In Attribute , B :binary, N : nominal, C : continuous. # min /# maj is the size
of minority and majority classes, respectively. Ratio is the size of the majority class divided by
that of the minority class.
3 The results are mainly from [6].
Search WWH ::




Custom Search