Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Fig. 2.15. Training without regularization: variation of the performance of a linear

classifier as a function of the numbers of relevant and irrelevant documents in the

training set

is discussed in detail in Chap. 1), regularization methods are mandatory to

avoid overfitting. Figure 2.15 shows the variation of F on a test base, without

regularization, as a function of the numbers of relevant and irrelevant docu-

ments present in the database. Clearly, the performance decreases, and the

norm of the vector of parameters increases, when the number of examples of

the training set decreases.

With the same training and test sets, training was performed with early

stopping. The results (Fig. 2.17) show that the performance is improved for

small numbers of examples in the training set, but it is decreased when numer-

ous examples are available ( F< 0 . 9), which is evidence that early stopping

does not make the best of the available data. The norm of the vector of para-

meters (not shown) remains very small.

Weight decay was also implemented on the same example, with two hy-

perparameters: one for the bias ( α b =0 . 001) and one for the connections

between the inputs and the output neuron ( α 1 = 1). The results are shown on

Fig. 2.18; the performance is improved when the number of examples is small,

and, by contrast with early stopping, it remains satisfactory for large numbers

of examples. As in the previous case, the norm of the vector of parameters

stays small.

Models whose outputs are not smooth enough can also be avoided, by

penalizing large values of the derivatives of the output with respect to the

inputs [Bishop 1993].

Search WWH ::

Custom Search

Home