Database Reference
In-Depth Information
where L ( y, DT ( S )( x )) is the zero-one loss function defined in Equa-
tion (3.2).
In this topic, classification accuracy is the primary evaluation criterion
for experiments.
Although generalization error is a natural criterion, its actual value is
known only in rare cases (mainly synthetic cases). The reason for that is
that the distribution D of the labeled instance space is not known.
One can take the training error as an estimation of the generalization
error. However, using the training error as is will typically provide an
optimistically biased estimate, especially if the inducer overfits the training
data. There are two main approaches for estimating the generalization error:
Theoretical and Empirical. In this topic, we utilize both approaches.
4.2.1
Theoretical Estimation of Generalization Error
A low training error does not guarantee low generalization error. There is
often a trade-off between the training error and the confidence assigned to
the training error as a predictor for the generalization error, measured by
the difference between the generalization and training errors. The capacity
of the inducer is a major factor in determining the degree of confidence
in the training error. In general, the capacity of an inducer indicates the
variety of classifiers it can induce. The VC-dimension presented below can
be used as a measure of the inducers capacity.
Decision trees with many nodes, relative to the size of the training
set, are likely to obtain a low training error. On the other hand, they
might just be memorizing or overfitting the patterns and hence exhibit
a poor generalization ability. In such cases, the low error is likely to be
a poor predictor of the higher generalization error. When the opposite
occurs, that is to say, when capacity is too small for the given number of
examples, inducers may underfit the data, and exhibit both poor training
and generalization error.
In “Mathematics of Generalization”, Wolpert (1995) discusses four
theoretical frameworks for estimating the generalization error: PAC, VC,
Bayesian, and statistical physics. All these frameworks combine the training
error (which can be easily calculated) with some penalty function expressing
the capacity of the inducers.
4.2.2
Empirical Estimation of Generalization Error
Another approach for estimating the generalization error is the holdout
method in which the given dataset is randomly partitioned into two sets:
Search WWH ::




Custom Search