Information Technology Reference
In-Depth Information
Algorithm The BalanceCascade algorithm
Input:
Data set: D ={ ( x i ,y i ) }
n
i
P
N
1 with minority class
and majority class
=
The number of iterations: T
The numb er of iterations to train an AdaBoost ensemble H i : s i
1: f
n n
, f is the false positive rate that H i should achieve.
/* false positive rate is the error rate of misclassifying a majority class
example to the minority class */
2: for i = 1to T do
3:
T 1
Randomly sample a subset N i of n + examples from the majority class
Learn H i using P and N i . H i is an AdaBoost ensemble with s i weak
classifiers h i,j and corresponding weights α i,j . The ensemble's threshold
is θ i i.e.
4:
H i (x) = sign s j = 1 α i,j h i,j ( x ) θ i .
Adjust θ i such that H i 's false positive rate is f .
5:
N
Remove from
all examples that are correctly classified by H i .
6:
7: end for
8: return H( x ) =
sign i = 1 s j = 1 α i,j h i,j ( x ) i = 1 θ i
4.3.3.1 BalanceCascade BalanceCascade tries to delete examples in the major-
ity class in a guided way [6]. Different from EasyEnsemble (which generates
subsamples of the majority class in an unsupervised parallel manner), Balance-
Cascade works in a supervised sequential manner. The basic idea is to shrink the
majority class step by step in cascade style. In each iteration, a subset
N i of n +
examples are sampled from the majority class. Then, an AdaBoost ensemble H i
is trained from the union of
. After that, the majority class examples
that are correctly classified by H i are considered as redundant information and
are removed from the majority class. The final ensemble is formed by combining
all the base learners in all the AdaBoost ensembles, as in EasyEnsemble. The
algorithm is shown as follows.
N i and
P
4.3.4 Other Ensemble Methods
There are also some ensemble methods that combine different CIL methods to
handle imbalanced data. The most feasible way is to ensemble classifiers gen-
erated by different methods directly. For example, Zhou and Liu [31] combined
the NN classifiers generated by over-sampling, under-sampling, and threshold-
moving via hard ensemble and soft ensemble. Some ensemble methods combine
classifiers trained from data with different levels of imbalance. For example,
Estabrooks et al. [32] generated multiple versions of training data with different
Search WWH ::




Custom Search