Information Technology Reference
In-Depth Information
The fact that the global minimum of the cost function, for a family of
models of given complexity, gives rise to a model whose Jacobian matrix
does not have full rank does not mean that all models that have the same
complexity must be discarded: a local minimum may give rise to a perfectly
valid model whereas the global minimum gives rise to an overfitted model.
That strategy is somewhat similar to early stopping: selecting a model that
is not a global minimum of the cost function may be a form of regularization.
In order to perform a selection among the surviving models, the virtual
leave-one-out technique is used. The leave-one-out score was defined above as
r k
1
2
N
1
N
E P
=
,
h kk
k =1
which is an unbiased estimate of the generalization error.
That score must be compared to the mean square error on the training set
(TMSE),
N T
1
N T
( r k ) 2 .
E T
=
k =1
It should be remembered that, in virtual leave-one-out, training is performed
with all available data; hence the same quantity N is involved in E p and E T in
the present case.
Generalization Error and TMSE
Since the leverages are positive and smaller than 1, E p is larger than the
TMSE; very overfitted models have numerous leverages on the order of 1,
hence have a generalization error that is much larger than the TMSE.
The Case of Large Training Sets
If all leverages are equal to q/N , one has: E p = N/ ( N
E p and E T are
equal in the limit of very large training sets for a model without overfitting,
which makes sense since the difference between TMSE and the generalization
error stems from the fact that the number of elements in training set is finite:
if an infinite amount of data were available, the regression would be known
exactly.
As an illustration, consider a neural network with four hidden neu-
rons, whose training was performed, with different initializations, with the
Levenberg-Marquardt algorithm, with the training set shown on Fig. 2.21.
Five hundred different trainings were performed. Figure 2.23 shows the re-
sults, with the following conventions:
q ) E T
·
 
Search WWH ::




Custom Search