Classification: Basic Concepts - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Algorithm: AdaBoost. A boosting algorithm—create an ensemble of classifiers. Each one

gives a weighted vote.

Input:

D , a set of d class-labeled training tuples;

k , the number of rounds (one classifier is generated per round);

a classification learning scheme.

Output: A composite model.

Method:

(1)

initialize the weight of each tuple in D to 1

=

d ;

(2)

for i D 1 to k do // for each round:

(3)

sample D with replacement according to the tuple weights to obtain D i ;

(4)

use training set D i to derive a model, M i ;

(5)

compute error

.

M i /

, the error rate of M i (Eq. 8.34)

(6)

if error

.

M i />

0.5 then

(7)

go back to step 3 and try again;

(8)

endif

(9)

for each tuple in D i that was correctly classified do

(10)

multiply the weight of the tuple by error . M i /=.1 error . M i //; // update weights

(11)

normalize the weight of each tuple;

(12)

endfor

To use the ensemble to classify tuple, X :

(1)

initialize weight of each class to 0;

(2)

for i D 1 to k do // for each classifier:

w i D log 1 error . M i /

(3)

; // weight of the classifier's vote

error

.

M i /

(4)

c D M i . X /; // get class prediction for X from M i

(5)

add w i to weight for class c

(6)

endfor

(7)

return the class with the largest weight;

Figure 8.24 AdaBoost, a boosting algorithm.

Therefore, sometimes the resulting “boosted” model may be less accurate than a single

model derived from the same data. Bagging is less susceptible to model overfitting. While

both can significantly improve accuracy in comparison to a single model, boosting tends

to achieve greater accuracy.

8.6.4 Random Forests

We now present another ensemble method called random forests . Imagine that each of

the classifiers in the ensemble is a decision tree classifier so that the collection of classifiers

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home