Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

where L ( y, DT ( S )( x )) is the zero-one loss function defined in Equa-

tion (3.2).

In this topic, classification accuracy is the primary evaluation criterion

for experiments.

Although generalization error is a natural criterion, its actual value is

known only in rare cases (mainly synthetic cases). The reason for that is

that the distribution D of the labeled instance space is not known.

One can take the training error as an estimation of the generalization

error. However, using the training error as is will typically provide an

optimistically biased estimate, especially if the inducer overfits the training

data. There are two main approaches for estimating the generalization error:

Theoretical and Empirical. In this topic, we utilize both approaches.

4.2.1

Theoretical Estimation of Generalization Error

A low training error does not guarantee low generalization error. There is

often a trade-off between the training error and the confidence assigned to

the training error as a predictor for the generalization error, measured by

the difference between the generalization and training errors. The capacity

of the inducer is a major factor in determining the degree of confidence

in the training error. In general, the capacity of an inducer indicates the

variety of classifiers it can induce. The VC-dimension presented below can

be used as a measure of the inducers capacity.

Decision trees with many nodes, relative to the size of the training

set, are likely to obtain a low training error. On the other hand, they

might just be memorizing or overfitting the patterns and hence exhibit

a poor generalization ability. In such cases, the low error is likely to be

a poor predictor of the higher generalization error. When the opposite

occurs, that is to say, when capacity is too small for the given number of

examples, inducers may underfit the data, and exhibit both poor training

and generalization error.

In “Mathematics of Generalization”, Wolpert (1995) discusses four

theoretical frameworks for estimating the generalization error: PAC, VC,

Bayesian, and statistical physics. All these frameworks combine the training

error (which can be easily calculated) with some penalty function expressing

the capacity of the inducers.

4.2.2

Empirical Estimation of Generalization Error

Another approach for estimating the generalization error is the holdout

method in which the given dataset is randomly partitioned into two sets:

Search WWH ::

Custom Search

Home