Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Then, the trained ensemble is employed to generate a new training set by

replacing the desired class labels of the original training examples with the

output from the trained ensemble. Some extra training examples are also

generated from the trained ensemble and added to the new training set.

Finally, a C4.5 decision tree is grown from the new training set. Since its

learning results are decision trees, the comprehensibility of NeC4.5 is better

than that of neural network ensembles.

Using several inducers can solve the dilemma which arises from the

“no free lunch” theorem. This theorem implies that a certain inducer

will be successful only insofar its bias matches the characteristics of the

application domain [Brazdil et al . (1994)]. Thus, given a certain application,

the practitioner need to decide which inducer should be used. Using the

multi-inducer obviate the need to try each one and simplifying the entire

process.

9.5.6

Measuring the Diversity

As stated above, it is usually assumed that increasing diversity may decrease

ensemble error [ Zenobi and Cunningham (2001) ] . For regression problems,

variance is usually used to measure diversity [ Krogh and Vedelsby (1995) ] .

In such cases it can be easily shown that the ensemble error can be reduced

by increasing ensemble diversity while maintaining the average error of a

single model.

In classification problems, a more complicated measure is required to

evaluate the diversity. There have been several attempts to define diversity

measure for classification tasks.

In the neural network literature, two measures are presented for

examining diversity:

•

Classification coverage: An instance is covered by a classifier, if it yields

a correct classification.

•

Coincident errors: A coincident error amongst the classifiers occurs when

more than one member misclassifies a given instance.

Based on these two measures, Sharkey (1997) defined four diversity levels:

•

Level 1 — No coincident errors and the classification function is

completely covered by a majority vote of the members.

•

Level 2 — Coincident errors may occur, but the classification function is

completely covered by a majority vote.

Search WWH ::

Custom Search

Home