Information Technology Reference
In-Depth Information
2.6.2.2 Cross-Validation
Cross-validation is a technique for estimating the generalization error of a
model, from data that are not used for parameter estimation (training) [Stone
1974]. First, the set of available data is split into D disjoint subsets. Then,
the following steps are performed, for a family of functions having the same
complexity (e.g., neural networks with a given number of hidden neurons):
iteration i , to be performed D times: build a training set with D -1 subsets
of the available data; perform several trainings, with different initial values
of the parameters; for each model, compute the mean square error (VMSE)
on the validation set made of the N V
remaining examples,
N V
1
N V
E V
=
( r k ) 2 ;
k =1
store in memory the smallest VMSE thus computed E Vi ;
compute the cross-validation score from the D quantities E Vi at each of
the D iterations
D
1
D
( E Vi ) 2 .
i =1
That score is an estimate of the generalization error for the family of functions
thus investigated.
For instance, if D = 5 is chosen (that is a typical value; the process is called
5-fold cross-validation), 5 different partitions of the database are constructed;
for each partition, 80% of the data are on the training set and 20% in the
validation set. As discussed above, the cross-validation score is the square
root of the average of the VMSE's computed on each partition. That average
must be performed because 20% of the database may not be a statistically
significant sample of the distribution of all possible examples. In a heuristic
fashion, the procedure may be simplified by performing a single partition of
the database, choosing a validation set that is as close as possible to the
distribution of the available examples. To that effect, one can estimate the
Kullback-Leibler divergence ([Kullback et al. 1951; Kullback 1959] between
two probability distributions p 1 et p 2 ,
D ( p 1 ,p 2 )= +
−∞
p 1 ( x )ln p 1 ( x )
p 2 ( x )
.
Because the expression is not symmetrical, a more satisfactory distance is
defined as
∆= 1
2 [ D ( p 1 ,p 2 )+ D ( p 2 ,p 1 )] .
Several random partitions of the database are performed, and the partition for
which the distance between the validation set and the training set is smallest
 
Search WWH ::




Custom Search