Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

35

30

25

20

15

10

5

0

1000

2000

3000

4000

Training Set Size

Fig. 4.9

Overfitting in decision trees.

Another aspect of overfitting is presented in Figure 4.9. This graph

presents the generalization error for increasing training set sizes. The

generalization error decreases with the training set size. This can be

explained by the fact that for a small training set, it is relatively hard

to generalize, and the classifier memorizes all instances.

It was found that overfitting decreases prediction accuracy in decision

trees either in the presence of significant noise or when the input attributes

are irrelevant to the classification problem [ Schaffer (1991) ] .

In order to avoid overfitting, it is necessary to estimate when further

training will not result in a better generalization. In decision trees there are

two mechanisms that help to avoid overfitting. The first is to avoid splitting

the tree if the split is not useful (for instance by approving only statistically

significant splits). The second approach is to use pruning; after growing the

tree, we prune unnecessary nodes.

4.10 “No Free Lunch” Theorem

Empirical comparison of the performance of different approaches and their

variants in a wide range of application domains has shown that each

performs best in some, but not all, domains. This has been termed the

selective superiority problem [ Brodley (1995) ] .

It is well known that no induction algorithm can be the best in all possi-

ble domains; each algorithm contains an explicit or implicit bias that leads it

to prefer certain generalizations over others [ Mitchell (1980) ] . The algorithm

will be successful only insofar as this bias matches the characteristics of

the application domain [ Brazdil et al . (1994) ] . Furthermore, other results

Search WWH ::

Custom Search

Home