Database Reference
In-Depth Information
training and test sets. Usually, two-thirds of the data is considered for the
training set and the remaining data are allocated to the test set. First,
the training set is used by the inducer to construct a suitable classifier and
then we measure the misclassification rate of this classifier on the test set.
This test set error usually provides a better estimation of the generalization
error than the training error. The reason for this is the fact that the training
error usually under-estimates the generalization error (due to the overfitting
phenomena). Nevertheless, since only a proportion of the data is used to
derive the model, the estimate of accuracy tends to be pessimistic.
A variation of the holdout method can be used when data is limited.
It is common practice to resample the data, that is, partition the data into
training and test sets in different ways. An inducer is trained and tested for
each partition and the accuracies averaged. By doing this, a more reliable
estimate of the true generalization error of the inducer is provided.
Random subsampling and n -fold cross-validation are two common
methods of resampling. In random subsampling, the data is randomly
partitioned several times into disjoint training and test sets. Errors obtained
from each partition are averaged. In n -fold cross-validation, the data is
randomly split into n mutually exclusive subsets of approximately equal
size. An inducer is trained and tested n times; each time it is tested on one
of the k folds and trained using the remaining n
1folds.
The cross-validation estimate of the generalization error is the overall
number of misclassifications divided by the number of examples in the data.
The random subsampling method has the advantage that it can be repeated
an indefinite number of times. However, a disadvantage is that the test sets
are not independently drawn with respect to the underlying distribution of
examples. Because of this, using a t -test for paired differences with random
subsampling can lead to an increased chance of type I error, i.e. identifying
a significant difference when one does not actually exist. Using a t -test on
the generalization error produced on each fold lowers the chances of type
I error but may not give a stable estimate of the generalization error. It
is common practice to repeat n -fold cross-validation n times in order to
provide a stable estimate. However, this, of course, renders the test sets
non-independent and increases the chance of type I error. Unfortunately,
there is no satisfactory solution to this problem. Alternative tests suggested
by Dietterich (1998) have a low probability of type I error but a higher
chance of type II error, that is, failing to identify a significant difference
when one does actually exist.
Stratification is a process often applied during random subsampling
and n -fold cross-validation. Stratification ensures that the class distribution
Search WWH ::




Custom Search