Information Technology Reference
In-Depth Information
Fig. 2.15. Training without regularization: variation of the performance of a linear
classifier as a function of the numbers of relevant and irrelevant documents in the
training set
is discussed in detail in Chap. 1), regularization methods are mandatory to
avoid overfitting. Figure 2.15 shows the variation of F on a test base, without
regularization, as a function of the numbers of relevant and irrelevant docu-
ments present in the database. Clearly, the performance decreases, and the
norm of the vector of parameters increases, when the number of examples of
the training set decreases.
With the same training and test sets, training was performed with early
stopping. The results (Fig. 2.17) show that the performance is improved for
small numbers of examples in the training set, but it is decreased when numer-
ous examples are available ( F< 0 . 9), which is evidence that early stopping
does not make the best of the available data. The norm of the vector of para-
meters (not shown) remains very small.
Weight decay was also implemented on the same example, with two hy-
perparameters: one for the bias ( α b =0 . 001) and one for the connections
between the inputs and the output neuron ( α 1 = 1). The results are shown on
Fig. 2.18; the performance is improved when the number of examples is small,
and, by contrast with early stopping, it remains satisfactory for large numbers
of examples. As in the previous case, the norm of the vector of parameters
stays small.
Models whose outputs are not smooth enough can also be avoided, by
penalizing large values of the derivatives of the output with respect to the
inputs [Bishop 1993].
Search WWH ::




Custom Search