Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

•

Active techniques: training is performed in order to avoid designing mod-

els that exhibit overfitting, by limiting the magnitude of the parameters;

regularization methods [Tikhonov et al. 1977; Poggio et al. 1985] are im-

plemented, as discussed in the present section.

The latter techniques are of special importance when large networks need

be designed; such is often the case in classification for visual pattern recogni-

tion, when a low-level representation is used (see the introduction to classifica-

tion in Chap. 1). In such situations, overfitting cannot be avoided by limiting

the number of parameters, since the number of inputs is a lower bound to

the number of parameters: the only way of avoiding overfitting consists in

limiting the amplitude of the parameters; it is even shown in [Bartlett et al.

1997] that, if a large network is designed, and if the training algorithms finds

a small mean square error with parameters of small amplitudes, than the gen-

eralization performances depend on the norm of the vector of parameters, and

is independent of the number of parameters.

There are essentially two families of regularization methods:

•

Early stopping consists in stopping training before a minimum of the cost

function is reached.

•

Penalty methods consist in adding a penalization term in the cost function

in order to favor regular models. The cost function has the form: J =

J + α Ω ,where J is, for instance, the least squares cost function, and

Ω is a function of the weights. The most popular penalty function is:

Ω = i

2 . The method involving that penalty function is called

w i

weight decay.

Both techniques will be discussed below.

2.5.4.1 Early Stopping

Principle

As usual, training consists in minimizing iteratively a cost function, the least

squares cost function for instance, whose value is computed on a training set.

Regularization takes place through the stopping criterion: training is termi-

nated before a minimum of the cost function is reached, so that the model

does not fit the training data as well as it could, given the number of pa-

rameters that are available to him; thus overfitting is limited. The di culty

that arises is: when to stop training? The most popular method consists in

monitoring the variation of the standard prediction error on a validation set,

and in terminating training when the prediction error starts increasing.

Example

We discuss an academic example from [Stricker 2000]. It is a two-class classi-

fication problem; as explained in Chap. 1, the output of the classifier should

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home