Information Technology Reference
In-Depth Information
Active techniques: training is performed in order to avoid designing mod-
els that exhibit overfitting, by limiting the magnitude of the parameters;
regularization methods [Tikhonov et al. 1977; Poggio et al. 1985] are im-
plemented, as discussed in the present section.
The latter techniques are of special importance when large networks need
be designed; such is often the case in classification for visual pattern recogni-
tion, when a low-level representation is used (see the introduction to classifica-
tion in Chap. 1). In such situations, overfitting cannot be avoided by limiting
the number of parameters, since the number of inputs is a lower bound to
the number of parameters: the only way of avoiding overfitting consists in
limiting the amplitude of the parameters; it is even shown in [Bartlett et al.
1997] that, if a large network is designed, and if the training algorithms finds
a small mean square error with parameters of small amplitudes, than the gen-
eralization performances depend on the norm of the vector of parameters, and
is independent of the number of parameters.
There are essentially two families of regularization methods:
Early stopping consists in stopping training before a minimum of the cost
function is reached.
Penalty methods consist in adding a penalization term in the cost function
in order to favor regular models. The cost function has the form: J =
J + α ,where J is, for instance, the least squares cost function, and
is a function of the weights. The most popular penalty function is:
= i
2 . The method involving that penalty function is called
w i
weight decay.
Both techniques will be discussed below.
2.5.4.1 Early Stopping
Principle
As usual, training consists in minimizing iteratively a cost function, the least
squares cost function for instance, whose value is computed on a training set.
Regularization takes place through the stopping criterion: training is termi-
nated before a minimum of the cost function is reached, so that the model
does not fit the training data as well as it could, given the number of pa-
rameters that are available to him; thus overfitting is limited. The di culty
that arises is: when to stop training? The most popular method consists in
monitoring the variation of the standard prediction error on a validation set,
and in terminating training when the prediction error starts increasing.
Example
We discuss an academic example from [Stricker 2000]. It is a two-class classi-
fication problem; as explained in Chap. 1, the output of the classifier should
Search WWH ::




Custom Search