Databases Reference
In-Depth Information
Algorithm: AdaBoost.
A boosting algorithm—create an ensemble of classifiers. Each one
gives a weighted vote.
Input:
D
, a set of
d
class-labeled training tuples;
k
, the number of rounds (one classifier is generated per round);
a classification learning scheme.
Output:
A composite model.
Method:
(1)
initialize the weight of each tuple in
D
to 1
=
d
;
(2)
for
i
D 1 to
k
do
// for each round:
(3)
sample
D
with replacement according to the tuple weights to obtain
D
i
;
(4)
use training set
D
i
to derive a model,
M
i
;
(5)
compute
error
.
M
i
/
, the error rate of
M
i
(Eq. 8.34)
(6)
if
error
.
M
i
/>
0.5
then
(7)
go back to step 3 and try again;
(8)
endif
(9)
for
each tuple in
D
i
that was correctly classified
do
(10)
multiply the weight of the tuple by
error
.
M
i
/=.1
error
.
M
i
//; // update weights
(11)
normalize the weight of each tuple;
(12)
endfor
To use the ensemble to classify tuple,
X
:
(1)
initialize weight of each class to 0;
(2)
for
i
D 1 to
k
do
// for each classifier:
w
i
D
log
1
error
.
M
i
/
(3)
; // weight of the classifier's vote
error
.
M
i
/
(4)
c
D
M
i
.
X
/; // get class prediction for
X
from
M
i
(5)
add
w
i
to weight for class
c
(6)
endfor
(7)
return the class with the largest weight;
Figure 8.24
AdaBoost, a boosting algorithm.
Therefore, sometimes the resulting “boosted” model may be less accurate than a single
model derived from the same data. Bagging is less susceptible to model overfitting. While
both can significantly improve accuracy in comparison to a single model, boosting tends
to achieve greater accuracy.
8.6.4
Random Forests
We now present another ensemble method called
random forests
. Imagine that each of
the classifiers in the ensemble is a
decision tree
classifier so that the collection of classifiers