Information Technology Reference
In-Depth Information
is retained. That single partition can satisfactorily be used for estimating the
generalization error. Since drawing randomly a large number of partitions
and computing the Kullback-Leibler divergence is much faster than training a
model, the computation time is divided roughly by a factor of 5 as compared to
complete 5-fold cross-validation. Making the assumption that the distributions
are two Gaussians p 1 ( µ 1 1 )and p 2 ( µ 2 2 ), the Kullbak-Leibler distance can
be written as
∆( p 1 ,p 2 )= ( σ 1
σ 1 )+( µ 1
µ 2 ) 2
( σ 1 + σ 1 ) .
4 σ 1 σ 1
The proof of that relation is given in the additional material at the end of
the chapter.
That heuristic procedure is very useful for fast prototyping of an initial
model, which can be further refined by conventional cross-validation or by the
virtual leave-one-out technique that is explained below.
2.6.2.3 Model Selection by Cross-Validation
Model design starts from simplest models (linear model), and gradually in-
creases the complexity (for neural models, by increasing the number of hidden
neurons).
One might also increase the number of hidden layers; for modeling prob-
lems, that can be considered in a second step of the design: if a satisfactory
model has been found with one hidden layer, one can, time permitting, try
to improve the performance by increasing the number of hidden layers, while
decreasing the number of neurons per layer. That procedure sometimes leads
to some improvement, usually a marginal one. Conversely, if no satisfactory
model has been found with one hidden layer, increasing the number of layers
will not do any good.
For each family of models, a cross-validation score is computed as explained
above. When overfitting occurs, the cross-validation score increases when the
complexity of the model increases. Therefore, the procedure is terminated
when the score starts increasing. The model that has the smallest VMSE is
selected.
2.6.2.4 Leave-One-Out
The estimation of the generalization error by leave-one-out is a special case
of cross-validation, for which D = N : At iteration k , example k is withdrawn
from the training set, trainings are performed (with different initial values
of the parameters) with the N
1 examples of the training set; for each
model, the prediction error on the withdrawn example k is computed, and the
smallest prediction error on the withdrawn example, denoted r ( −k )
k
is stored.
The leave-one-out score
Search WWH ::




Custom Search