ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Algorithm The BalanceCascade algorithm

Input:

Data set: D ={ ( x i ,y i ) }

1 with minority class

and majority class

The number of iterations: T

The numb er of iterations to train an AdaBoost ensemble H i : s i

1: f

n n

, f is the false positive rate that H i should achieve.

/* false positive rate is the error rate of misclassifying a majority class

example to the minority class */

2: for i = 1to T do

⇐

T − 1

−

Randomly sample a subset N i of n + examples from the majority class

Learn H i using P and N i . H i is an AdaBoost ensemble with s i weak

classifiers h i,j and corresponding weights α i,j . The ensemble's threshold

is θ i i.e.

H i (x) = sign s j = 1 α i,j h i,j ( x ) − θ i .

Adjust θ i such that H i 's false positive rate is f .

Remove from

all examples that are correctly classified by H i .

7: end for

8: return H( x ) =

sign i = 1 s j = 1 α i,j h i,j ( x ) − i = 1 θ i

4.3.3.1 BalanceCascade BalanceCascade tries to delete examples in the major-

ity class in a guided way [6]. Different from EasyEnsemble (which generates

subsamples of the majority class in an unsupervised parallel manner), Balance-

Cascade works in a supervised sequential manner. The basic idea is to shrink the

majority class step by step in cascade style. In each iteration, a subset

N i of n +

examples are sampled from the majority class. Then, an AdaBoost ensemble H i

is trained from the union of

. After that, the majority class examples

that are correctly classified by H i are considered as redundant information and

are removed from the majority class. The final ensemble is formed by combining

all the base learners in all the AdaBoost ensembles, as in EasyEnsemble. The

algorithm is shown as follows.

N i and

4.3.4 Other Ensemble Methods

There are also some ensemble methods that combine different CIL methods to

handle imbalanced data. The most feasible way is to ensemble classifiers gen-

erated by different methods directly. For example, Zhou and Liu [31] combined

the NN classifiers generated by over-sampling, under-sampling, and threshold-

moving via hard ensemble and soft ensemble. Some ensemble methods combine

classifiers trained from data with different levels of imbalance. For example,

Estabrooks et al. [32] generated multiple versions of training data with different

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home