Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

expectation value) and the noise (difference between the measured value and

its unknown expectation value). If the model is perfect, the prediction error

is due to the noise only. Therefore, one can obtain a leverage equal to zero if

and only if the family of functions within which the model is sought contains

the regression function.

If a leverage is very close to 1, the unit vector borne by axis k is very close

to solution subspace; hence that example is almost perfectly learnt, and it

has a large influence on the parameters of the model. The prediction error on

that example is almost zero when the example is in the training set and it is

very large when the example is withdrawn from the training set. Therefore, the

model is overfitted to that example. The confidence interval on that example

is very small when the example is in the training set, and very large when it

is not in the training set.

The above interpretation of the leverages is central to the model selection

methodology that is discussed in the next section.

2.6.4 Model Selection Methodology by Combination of the Local

and Global Approaches

Assume that inputs have been selected as described in the sections devoted

to input selection. We try to design the best model given the available data.

We discuss here a constructive procedure, whereby the complexity of the

model is increased gradually until overfitting occurs. For didactic purposes,

we split the procedure into two steps:

For a family of functions of given complexity, nonlinear with respect to

its parameters (for instance, neural networks with a given number of hidden

neurons), several trainings are performed with all available data, with dif-

ferent parameters initializations. Thus, several models are obtained; models

whose Jacobian matrices do not have full rank are discarded. The next section,

explains how to make a choice between the models that were not discarded

because of the rank of their Jacobian matrices.

For a model that is linear with respect to its parameters, that step is

very simple since the cost function has a single minimum: a single training is

performed with all available data.

The previous step having been performed with families of models of in-

creasing complexity, the best model is selected as explained in the section

entitled “Selection of the best architecture”.

2.6.4.1 Model Selection Within a Family of Models of Given

Complexity: Global Criteria

For a given model complexity, several trainings are performed, and, at the end

of each training, the rank of the Jacobian matrix of the model thus designed

is computed. If that matrix does not have full rank, the model is discarded.

Search WWH ::

Custom Search

Home