Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

2.6 Model Selection

After variable selection and training, model selection is the third important

element of a model design methodology. We assume that several candidate

models have been trained, one of which must be chosen. The model should

be complex enough to find the deterministic relations between the quantity

to be modeled and the factors that have a significant influence on it, yet not

be overly complex in order to be free from overfitting. In other words, the

selected model should embody the best tradeoff between learning capacity

and generalization capacity: if the model learns too well, it fits the noise,

hence generalizes poorly. That tradeoff has been formalized under the term

bias-variance dilemma [Geman et al. 1992].

From a theoretical point of view, the model that is sought is the model for

which the theoretical cost function ( y p ( x )

g ( x , w )) 2 P X ( x ) d x is minimal.

−

That quantity may be split into two terms:

the bias 5 , which expresses the average, over all possible training sets (with

all possible realizations of the random variables that model the noise)

of the squared difference between the predictions of the model and the

regression function;

•

the variance, which expresses the sensitivity of the model to the training

set (with its own realization of the noise).

Because the above theoretical cost function cannot be computed, the empirical

least squares cost function is minimized during training, as discussed in the

previous section.

Thus, a very complex model, with a large number of adjustable parameters,

may have a very low bias, i.e., may have the ability of fitting the data whatever

the noise present in the latter, but it is apt to have a very large variance,

i.e., to depend strongly on the specific realization of the noise present in

the training set. Conversely, a very simple model, with a small number of

adjustable parameters, may be insensitive to the noise present in the training

data, but turn out to be unable to approximate the regression function.

Figure 2.19 illustrates the behavior of two models g 1 ( x )andg 2 ( x ), with

the same complexity (linear models), which have too large a bias and too

small a variance: the predictions of the two models, obtained with different

training sets, are almost identical, but they are very different from the regres-

sion function. Conversely, Fig. 2.20 illustrates the behaviors of two models

that have a low bias (they are close to the regression) but they have a large

variance since their predictions depend on the training set.

The next two illustrations, and several elements of the present section, are

excerpts from [Monari 1999].

5 This should not be mistaken with the constant input of a model, unfortunately

also called bias.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home