Information Technology Reference
In-Depth Information
2.6 Model Selection
After variable selection and training, model selection is the third important
element of a model design methodology. We assume that several candidate
models have been trained, one of which must be chosen. The model should
be complex enough to find the deterministic relations between the quantity
to be modeled and the factors that have a significant influence on it, yet not
be overly complex in order to be free from overfitting. In other words, the
selected model should embody the best tradeoff between learning capacity
and generalization capacity: if the model learns too well, it fits the noise,
hence generalizes poorly. That tradeoff has been formalized under the term
bias-variance dilemma [Geman et al. 1992].
From a theoretical point of view, the model that is sought is the model for
which the theoretical cost function ( y p ( x )
g ( x , w )) 2 P X ( x ) d x is minimal.
That quantity may be split into two terms:
the bias 5 , which expresses the average, over all possible training sets (with
all possible realizations of the random variables that model the noise)
of the squared difference between the predictions of the model and the
regression function;
the variance, which expresses the sensitivity of the model to the training
set (with its own realization of the noise).
Because the above theoretical cost function cannot be computed, the empirical
least squares cost function is minimized during training, as discussed in the
previous section.
Thus, a very complex model, with a large number of adjustable parameters,
may have a very low bias, i.e., may have the ability of fitting the data whatever
the noise present in the latter, but it is apt to have a very large variance,
i.e., to depend strongly on the specific realization of the noise present in
the training set. Conversely, a very simple model, with a small number of
adjustable parameters, may be insensitive to the noise present in the training
data, but turn out to be unable to approximate the regression function.
Figure 2.19 illustrates the behavior of two models g 1 ( x )andg 2 ( x ), with
the same complexity (linear models), which have too large a bias and too
small a variance: the predictions of the two models, obtained with different
training sets, are almost identical, but they are very different from the regres-
sion function. Conversely, Fig. 2.20 illustrates the behaviors of two models
that have a low bias (they are close to the regression) but they have a large
variance since their predictions depend on the training set.
The next two illustrations, and several elements of the present section, are
excerpts from [Monari 1999].
5 This should not be mistaken with the constant input of a model, unfortunately
also called bias.
Search WWH ::




Custom Search