Information Technology Reference
In-Depth Information
22.2.3 Boosting
Boosting combines a set of weak learners to create a single strong learner. A weak
learner is defined as a classifier that is only slightly correlated with the true classifi-
cation (it can label examples better than random guess). In contrast, a strong learner
is a classifier that is arbitrarily well correlated with the true classification.
Here we take AdaBoost as an example to illustrate the basic idea of Boosting
algorithms. AdaBoost consists of iteratively learning weak classifiers with respect
to a distribution and adding them to a final strong classifier. When a weak learner
is added, it is weighted in some way that is related to its accuracy. After a weak
learner is added, the data are reweighted: examples that are misclassified will gain
weight and examples that are classified correctly will lose weight. Thus, future
weak learners focus more on the examples that previous weak learners misclassi-
fied.
Specifically, AdaBoost performs the learning in an iterative manner ( t
=
1 ,...,T ). The algorithm maintains a distribution or set of weights over the training
set. The weight of this distribution on training example x i on round t is denoted
D t (i) . Initially, all weights are set equally, but on each round, the weights of incor-
rectly classified examples are increased. The details of the AdaBoost algorithm are
shown in Algorithm 1 .
Algorithm 1 : Learning algorithm for AdaBoost
Input : training instances
n
i
{
}
(x i ,y i )
=
1
1
n
Given : initial distribution
D 1 (x i )
=
For t
1 ,...,T
Train weak learner f t based on distribution
=
D t
Get weak learner f t : X → {−
1 , 1
}
based on distribution
D t , whose error
is ε t =
Pr
D t [
f t (x i )
=
y i ]
2 log ( 1 ε t
1
Choose α t =
)
ε t
1
Update
D t + 1 (x i ) =
Z t D t (x i ) exp ( α t y i f t (x i )) , where Z t is a normalizer
= t α t f t (x)
Output : f(x)
Previous work shows that Boosting has state-of-the-art classification perfor-
mance, and can avoid over fitting in some cases. That is, when more iterations are
conducted, even if the training error has been zero, the test error can still be reduced.
This phenomenon has been explained using the margin theory. That is, when more
iterations are used, although one cannot further improve the training error, the mar-
gin of the training data (or the classification confidence on the training data) can still
be improved.
Search WWH ::




Custom Search