Information Technology Reference
In-Depth Information
Algorithm 1. Greedy-Boost algorithm
S =( x 1 ,y 1 ) , ....., ( x n ,y n ) is a set of labeled instances (connections)
for i =1 to n do
Initialize p 0 ( x i )=1 /n
end for
for t =1 to T do
Choose S t from S using bootstrap sampling.
Construct model h t for S t using weal learner A
Calculate t the error rate of h t
Calculate score of classifier α t =1 / 2 ln ((1 t ) / t ).
for j =1 to m do
P t +1 ( x j ) ( p 0 ( x j ) /Z t ) e α t
If argmax ( i =1 α i h i ( x j )) = y j
(the connection is well classified)
P t +1 ( x j ) ( p 0 ( x j ) /Z t ) e + α t
If argmax ( i =1 α i h i ( x j )) = y j (The connection is badly classified)
Use Z t to normalize with j =1 p t ( x j )=1)
end for
end for
return
t =1 α t
H ( x )= argmax y ∈ Y
the distribution based on the initial distribution instead of the previous one (all
examples with the same weight). In other words, we evaluate the examples badly
classified by the standard model. This produces a new sample which will be used
to generate a new model. The interest of such a process is the fact that it nec-
essarily converges towards a stable solution under the hypotheses that on each
iteration, we are able to find a weak learner, i.e, with an error t < 1 / 2onthe
weighted sample.
Theorem. Based on the hypotheses that for each iteration, the learner construct
by Greedy-Boost is at least a weak learner, i.e,
t , t
< 1 / 2 , Greedy-boost con-
verges necessarily towards a stable solution when T
→∞
.
Proof : The proof is based on a theorem of [12] which ensures that in iteration
t +1, the error on the learning sample of the combined classifier H t +1 of AdaBoost
is lower than the one of H t , the combined classifier of iteration t [12]:
1
err ( H t +1 ) <err ( H t )
×
4(1
t ) 2
(1)
err(.) is the empirical error. With t < 1 / 2, we have 1
t ) 2 < 1so
err ( H t +1 ) <err ( H t ). We consider that the solution of Greedy-Boost is equiva-
lent to the solution of AdaBoost in the two fist iterations H GLO
4(1
1 ( x )= H AD 1 ( x ).
After, for each iteration, we change the distribution P t with the same way of
AdaBoost, based on the linear combination of models previously generated. This
process is equivalent to applying AdaBoost on the current classifier H GRE
t
.So
× 1
based on (1), we have H GRE
t
<H GRE
t
4(1
t ) 2 . Consequently, in iter-
+1
ation T ,wehave:
 
Search WWH ::




Custom Search