Information Technology Reference
In-Depth Information
Fig. 1.14. The most parsimonious neural network has the best generalization
abilities
where N V is the number of observations present in the validation set, and
where, for simplicity, y k denote the measurements of the quantity to be mod-
eled: y k = y p ( x k ). This relation is valid in the usual case of a model with a
single output; if the model has several outputs, the VMSE is the sum of the
mean square errors on each output.
This quantity should be compared with the mean square error on the
training set (TMSE),
N T
1
N T
g ( x k , w )] 2 ,
TMSE =
[ y k
k =1
where N T is the number of observations present in the training set.
Consider the example shown on Fig. 1.14, and assume that the observa-
tions of the validation set are the midpoints between the observations of the
training set. Clearly, the TMSE of the second network is certainly smaller than
the TMSE of the first network, whereas the VMSE of the second network is
certainly larger than that of the first network. Therefore, if model selection
were performed on the basis of the training mean square error, overparame-
terized networks would systematically be favored, thereby leading to models
that exhibit overfitting.
 
Search WWH ::




Custom Search