Information Technology Reference
In-Depth Information
Algorithm The EasyEnsemble algorithm
Input:
Data set: D ={ ( x i ,y i ) }
n
i
P
N
1 with minority class
and majority class
=
The number of iterations: T
The number of iterations to train an AdaBoost ensemble H i : s i
1: for i
=
1to T do
i
i
+
1
2:
Randomly sample a subset
N
i of n + examples form the majority class
3:
Learn H i using
i . H i is an AdaBoost ensemble with s i weak
classifiers h i,j and corresponding weights α i,j . The ensemble's threshold
is θ i ,i.e.
H i (x) =
P
and
N
4:
sign s j = 1 α i,j h i,j ( x ) θ i .
5: end for
Output: H( x ) = sign i = 1 s j = 1 α i,j h i,j ( x ) i = 1 θ i
from each bag. The final ensemble is formed by combining all base learners in
all AdaBoost ensembles. The algorithm is shown in the following.
One possible way is to combine all AdaBoost classifiers' predictions, that is,
.
s i
T
F( x ) = sign
α i,j h i,j ( x ) θ i
(4.1)
sign
i
=
1
j
=
1
If the ensemble was formed in this way, then the method is just a Bagging-style
ensemble method with AdaBoost ensembles as base learners. It is worth noting
that it is not the AdaBoost classifiers but the base learners forming them that are
combined for prediction, that is,
s i
T
T
.
H( x ) = sign
α i,j h i,j ( x )
θ i
(4.2)
i
=
1
j
=
1
i
=
1
A different combination method leads to a totally different ensemble method.
Some experimental results showed that combining AdaBoost classifiers directly
using Equation 4.1 caused an obvious decrease in performance. This is because
each AdaBoost classifier gives only a single output, ignoring the detailed infor-
mation provided by its base learners. The way AdaBoost is used in EasyEnsemble
is similar to the usage of AdaBoost in [30], where the base learners are treated
as features with binary values. EasyEnsemble also treats the base learners as
features to explore different aspects in the learning process.
Search WWH ::




Custom Search