Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

is retained. That single partition can satisfactorily be used for estimating the

generalization error. Since drawing randomly a large number of partitions

and computing the Kullback-Leibler divergence is much faster than training a

model, the computation time is divided roughly by a factor of 5 as compared to

complete 5-fold cross-validation. Making the assumption that the distributions

are two Gaussians p 1 ( µ 1 ,σ 1 )and p 2 ( µ 2 ,σ 2 ), the Kullbak-Leibler distance can

be written as

∆( p 1 ,p 2 )= ( σ 1 −

σ 1 )+( µ 1 −

µ 2 ) 2

( σ 1 + σ 1 ) .

4 σ 1 σ 1

The proof of that relation is given in the additional material at the end of

the chapter.

That heuristic procedure is very useful for fast prototyping of an initial

model, which can be further refined by conventional cross-validation or by the

virtual leave-one-out technique that is explained below.

2.6.2.3 Model Selection by Cross-Validation

Model design starts from simplest models (linear model), and gradually in-

creases the complexity (for neural models, by increasing the number of hidden

neurons).

One might also increase the number of hidden layers; for modeling prob-

lems, that can be considered in a second step of the design: if a satisfactory

model has been found with one hidden layer, one can, time permitting, try

to improve the performance by increasing the number of hidden layers, while

decreasing the number of neurons per layer. That procedure sometimes leads

to some improvement, usually a marginal one. Conversely, if no satisfactory

model has been found with one hidden layer, increasing the number of layers

will not do any good.

For each family of models, a cross-validation score is computed as explained

above. When overfitting occurs, the cross-validation score increases when the

complexity of the model increases. Therefore, the procedure is terminated

when the score starts increasing. The model that has the smallest VMSE is

selected.

2.6.2.4 Leave-One-Out

The estimation of the generalization error by leave-one-out is a special case

of cross-validation, for which D = N : At iteration k , example k is withdrawn

from the training set, trainings are performed (with different initial values

of the parameters) with the N

1 examples of the training set; for each

model, the prediction error on the withdrawn example k is computed, and the

smallest prediction error on the withdrawn example, denoted r ( −k )

k

−

is stored.

The leave-one-out score

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home