Machine Learning - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

22.2.3 Boosting

Boosting combines a set of weak learners to create a single strong learner. A weak

learner is defined as a classifier that is only slightly correlated with the true classifi-

cation (it can label examples better than random guess). In contrast, a strong learner

is a classifier that is arbitrarily well correlated with the true classification.

Here we take AdaBoost as an example to illustrate the basic idea of Boosting

algorithms. AdaBoost consists of iteratively learning weak classifiers with respect

to a distribution and adding them to a final strong classifier. When a weak learner

is added, it is weighted in some way that is related to its accuracy. After a weak

learner is added, the data are reweighted: examples that are misclassified will gain

weight and examples that are classified correctly will lose weight. Thus, future

weak learners focus more on the examples that previous weak learners misclassi-

fied.

Specifically, AdaBoost performs the learning in an iterative manner ( t

=

1 ,...,T ). The algorithm maintains a distribution or set of weights over the training

set. The weight of this distribution on training example x i on round t is denoted

D t (i) . Initially, all weights are set equally, but on each round, the weights of incor-

rectly classified examples are increased. The details of the AdaBoost algorithm are

shown in Algorithm 1 .

Algorithm 1 : Learning algorithm for AdaBoost

Input : training instances

n

i

{

}

(x i ,y i )

=

1

n

Given : initial distribution

D 1 (x i )

=

For t

1 ,...,T

Train weak learner f t based on distribution

=

D t

Get weak learner f t : X → {−

1 , 1

}

based on distribution

D t , whose error

is ε t =

Pr

D t [

f t (x i )

=

y i ]

2 log ( 1 − ε t

1

Choose α t =

)

ε t

1

Update

D t + 1 (x i ) =

Z t D t (x i ) exp ( − α t y i f t (x i )) , where Z t is a normalizer

= t α t f t (x)

Output : f(x)

Previous work shows that Boosting has state-of-the-art classification perfor-

mance, and can avoid over fitting in some cases. That is, when more iterations are

conducted, even if the training error has been zero, the test error can still be reduced.

This phenomenon has been explained using the margin theory. That is, when more

iterations are used, although one cannot further improve the training error, the mar-

gin of the training data (or the classification confidence on the training data) can still

be improved.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home