Information Technology Reference
In-Depth Information
Algorithm The SMOTEBoost algorithm
Input:
Data set D ={ ( x i ,y i ) }
n
i
1 with y ∈{
1 , ..., M }
, where C m is the minority class
The number of synthetic examples to be generated in each iteration N
The number of iterations T
1: Let B
=
={
(i, y) : i
=
1 , ..., n, y
=
y i }
D
1 (i, y)
=
1 /
|
B
|
for (i, y)
B
2:
3: for i
1to T do
4: Modify distribution
=
t by creating N synthetic examples from C m using
the SMOTE algorithm
D
Train a weak learner using distribution D t
5:
Compute weak hypothesis h t : X × Y [0 , 1]
6:
Compute the pseudo-loss of hypothesis h t :
7:
e t =
(i,y)
B D t (i, y)( 1 h t ( x i ,y i ) + h t ( x i ,y))
Set α t = ln 1 e t
e t
8:
1
2 ( 1 h t ( x i ,y) + h t ( x i ,y i ))
Set d t =
9:
Update D t : D t + 1 (i, y) = D t (i,y)
Z t e α t d t
/* Z t is a normalization constant such that D t + 1 is a distribution */
10:
11: end for
Output: H( x ) = argmax
y
t = 1 α t h t ( x ,y)
Y
class and the minority class are balanced, which increases the individual weights
of all minority class examples.
4.3.3 Hybrid Ensemble Methods
It is well known that boosting mainly reduces bias, whereas bagging mainly
reduces variance. Several methods [26-29] combine different ensemble strategies
to achieve stronger generalization. For example, MultiBoosting [27, 28] combines
boosting with bagging using boosted ensembles as base learners.
Multiple ensemble strategies can also cooperate to handle imbalanced data,
such as EasyEnsemble and BalanceCascade [6]. Both of them try to improve
under-sampling by adopting boosting and bagging-style ensemble strategies.
EasyEnsemble [6]. The motivation of EasyEnsemble is to reduce the possi-
bility of ignoring potentially useful information contained in the majority class
examples while keeping the high efficiency of under-sampling. First, it gener-
ates a set of balanced data samples, with each bag containing all minority class
examples and a subset of the majority class examples with size n + .Thisstepis
similar to Bagging-style methods for CIL. Then an AdaBoost ensemble is trained
Search WWH ::




Custom Search