Database Reference
In-Depth Information
35
30
25
20
15
10
5
0
0
1000
2000
3000
4000
Training Set Size
Fig. 4.9
Overfitting in decision trees.
Another aspect of overfitting is presented in Figure 4.9. This graph
presents the generalization error for increasing training set sizes. The
generalization error decreases with the training set size. This can be
explained by the fact that for a small training set, it is relatively hard
to generalize, and the classifier memorizes all instances.
It was found that overfitting decreases prediction accuracy in decision
trees either in the presence of significant noise or when the input attributes
are irrelevant to the classification problem [ Schaffer (1991) ] .
In order to avoid overfitting, it is necessary to estimate when further
training will not result in a better generalization. In decision trees there are
two mechanisms that help to avoid overfitting. The first is to avoid splitting
the tree if the split is not useful (for instance by approving only statistically
significant splits). The second approach is to use pruning; after growing the
tree, we prune unnecessary nodes.
4.10 “No Free Lunch” Theorem
Empirical comparison of the performance of different approaches and their
variants in a wide range of application domains has shown that each
performs best in some, but not all, domains. This has been termed the
selective superiority problem [ Brodley (1995) ] .
It is well known that no induction algorithm can be the best in all possi-
ble domains; each algorithm contains an explicit or implicit bias that leads it
to prefer certain generalizations over others [ Mitchell (1980) ] . The algorithm
will be successful only insofar as this bias matches the characteristics of
the application domain [ Brazdil et al . (1994) ] . Furthermore, other results
Search WWH ::




Custom Search