Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

2.6.2.2 Cross-Validation

Cross-validation is a technique for estimating the generalization error of a

model, from data that are not used for parameter estimation (training) [Stone

1974]. First, the set of available data is split into D disjoint subsets. Then,

the following steps are performed, for a family of functions having the same

complexity (e.g., neural networks with a given number of hidden neurons):

•

iteration i , to be performed D times: build a training set with D -1 subsets

of the available data; perform several trainings, with different initial values

of the parameters; for each model, compute the mean square error (VMSE)

on the validation set made of the N V

remaining examples,

N V

1

N V

E V

=

( r k ) 2 ;

k =1

store in memory the smallest VMSE thus computed E Vi ;

•

compute the cross-validation score from the D quantities E Vi at each of

the D iterations

D

1

D

( E Vi ) 2 .

i =1

That score is an estimate of the generalization error for the family of functions

thus investigated.

For instance, if D = 5 is chosen (that is a typical value; the process is called

5-fold cross-validation), 5 different partitions of the database are constructed;

for each partition, 80% of the data are on the training set and 20% in the

validation set. As discussed above, the cross-validation score is the square

root of the average of the VMSE's computed on each partition. That average

must be performed because 20% of the database may not be a statistically

significant sample of the distribution of all possible examples. In a heuristic

fashion, the procedure may be simplified by performing a single partition of

the database, choosing a validation set that is as close as possible to the

distribution of the available examples. To that effect, one can estimate the

Kullback-Leibler divergence ([Kullback et al. 1951; Kullback 1959] between

two probability distributions p 1 et p 2 ,

D ( p 1 ,p 2 )= + ∞

−∞

p 1 ( x )ln p 1 ( x )

p 2 ( x )

.

Because the expression is not symmetrical, a more satisfactory distance is

defined as

∆= 1

2 [ D ( p 1 ,p 2 )+ D ( p 2 ,p 1 )] .

Several random partitions of the database are performed, and the partition for

which the distance between the validation set and the training set is smallest

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home