Information Technology Reference
In-Depth Information
Fig. 2.18. Training with regularization by weight decay: variation of the perfor-
mance as a function of the number of relevant and irrelevant documents in the
training set
2.5.5 Conclusion on the Training of Static Models
We have made the following distinctions:
The training of models that are linear with respect to their parameters vs.
the training of models that are not linear with respect to their parameters.
Adaptive (on-line) training vs. nonadaptive (batch) training.
Training without regularization vs. training with regularization.
We have shown
that the training of models that are linear with respect to their parame-
ters (such as polynomials) can be performed easily with the traditional
least-squares methods, whereas the training of models that are nonlinear
with respect to their parameters (such as neural networks) requires more
complex methods that, however, are e cient and clearly understood: that
is the price that must be paid for taking advantage of parsimony;
that training is generally performed nonadaptatively, with e cient second-
order minimization algorithms; if necessary, the model can be updated by
adaptive methods in order to take into account slow drifts of the charac-
teristics of the process;
that overfitting can be avoided by limiting the amplitude of the parame-
ters of the model with a regularization method during training; that is
especially necessary when the number of training examples is small.
The next section discusses the problem of overfitting in a more general frame-
work: model selection.
Search WWH ::




Custom Search