Databases Reference
In-Depth Information
Algorithm: AdaBoost. A boosting algorithm—create an ensemble of classifiers. Each one
gives a weighted vote.
Input:
D , a set of d class-labeled training tuples;
k , the number of rounds (one classifier is generated per round);
a classification learning scheme.
Output: A composite model.
Method:
(1)
initialize the weight of each tuple in D to 1
=
d ;
(2)
for i D 1 to k do // for each round:
(3)
sample D with replacement according to the tuple weights to obtain D i ;
(4)
use training set D i to derive a model, M i ;
(5)
compute error
.
M i /
, the error rate of M i (Eq. 8.34)
(6)
if error
.
M i />
0.5 then
(7)
go back to step 3 and try again;
(8)
endif
(9)
for each tuple in D i that was correctly classified do
(10)
multiply the weight of the tuple by error . M i /=.1 error . M i //; // update weights
(11)
normalize the weight of each tuple;
(12)
endfor
To use the ensemble to classify tuple, X :
(1)
initialize weight of each class to 0;
(2)
for i D 1 to k do // for each classifier:
w i D log 1 error . M i /
(3)
; // weight of the classifier's vote
error
.
M i /
(4)
c D M i . X /; // get class prediction for X from M i
(5)
add w i to weight for class c
(6)
endfor
(7)
return the class with the largest weight;
Figure 8.24 AdaBoost, a boosting algorithm.
Therefore, sometimes the resulting “boosted” model may be less accurate than a single
model derived from the same data. Bagging is less susceptible to model overfitting. While
both can significantly improve accuracy in comparison to a single model, boosting tends
to achieve greater accuracy.
8.6.4 Random Forests
We now present another ensemble method called random forests . Imagine that each of
the classifiers in the ensemble is a decision tree classifier so that the collection of classifiers
 
Search WWH ::




Custom Search