ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Algorithm The SMOTEBoost algorithm

Input:

Data set D ={ ( x i ,y i ) }

1 with y ∈{

1 , ..., M }

, where C m is the minority class

The number of synthetic examples to be generated in each iteration N

The number of iterations T

1: Let B

(i, y) : i

1 , ..., n, y

y i }

1 (i, y)

1 /

for (i, y)

∈

3: for i

1to T do

4: Modify distribution

t by creating N synthetic examples from C m using

the SMOTE algorithm

Train a weak learner using distribution D t

Compute weak hypothesis h t : X × Y → [0 , 1]

Compute the pseudo-loss of hypothesis h t :

e t =

(i,y)

B D t (i, y)( 1 − h t ( x i ,y i ) + h t ( x i ,y))

∈

Set α t = ln 1 − e t

e t

2 ( 1 − h t ( x i ,y) + h t ( x i ,y i ))

Set d t =

Update D t : D t + 1 (i, y) = D t (i,y)

Z t e − α t d t

/* Z t is a normalization constant such that D t + 1 is a distribution */

10:

11: end for

Output: H( x ) = argmax

t = 1 α t h t ( x ,y)

∈ Y

class and the minority class are balanced, which increases the individual weights

of all minority class examples.

4.3.3 Hybrid Ensemble Methods

It is well known that boosting mainly reduces bias, whereas bagging mainly

reduces variance. Several methods [26-29] combine different ensemble strategies

to achieve stronger generalization. For example, MultiBoosting [27, 28] combines

boosting with bagging using boosted ensembles as base learners.

Multiple ensemble strategies can also cooperate to handle imbalanced data,

such as EasyEnsemble and BalanceCascade [6]. Both of them try to improve

under-sampling by adopting boosting and bagging-style ensemble strategies.

EasyEnsemble [6]. The motivation of EasyEnsemble is to reduce the possi-

bility of ignoring potentially useful information contained in the majority class

examples while keeping the high efficiency of under-sampling. First, it gener-

ates a set of balanced data samples, with each bag containing all minority class

examples and a subset of the majority class examples with size n + .Thisstepis

similar to Bagging-style methods for CIL. Then an AdaBoost ensemble is trained

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home