Information Technology Reference
In-Depth Information
expectation value) and the noise (difference between the measured value and
its unknown expectation value). If the model is perfect, the prediction error
is due to the noise only. Therefore, one can obtain a leverage equal to zero if
and only if the family of functions within which the model is sought contains
the regression function.
If a leverage is very close to 1, the unit vector borne by axis k is very close
to solution subspace; hence that example is almost perfectly learnt, and it
has a large influence on the parameters of the model. The prediction error on
that example is almost zero when the example is in the training set and it is
very large when the example is withdrawn from the training set. Therefore, the
model is overfitted to that example. The confidence interval on that example
is very small when the example is in the training set, and very large when it
is not in the training set.
The above interpretation of the leverages is central to the model selection
methodology that is discussed in the next section.
2.6.4 Model Selection Methodology by Combination of the Local
and Global Approaches
Assume that inputs have been selected as described in the sections devoted
to input selection. We try to design the best model given the available data.
We discuss here a constructive procedure, whereby the complexity of the
model is increased gradually until overfitting occurs. For didactic purposes,
we split the procedure into two steps:
For a family of functions of given complexity, nonlinear with respect to
its parameters (for instance, neural networks with a given number of hidden
neurons), several trainings are performed with all available data, with dif-
ferent parameters initializations. Thus, several models are obtained; models
whose Jacobian matrices do not have full rank are discarded. The next section,
explains how to make a choice between the models that were not discarded
because of the rank of their Jacobian matrices.
For a model that is linear with respect to its parameters, that step is
very simple since the cost function has a single minimum: a single training is
performed with all available data.
The previous step having been performed with families of models of in-
creasing complexity, the best model is selected as explained in the section
entitled “Selection of the best architecture”.
2.6.4.1 Model Selection Within a Family of Models of Given
Complexity: Global Criteria
For a given model complexity, several trainings are performed, and, at the end
of each training, the rank of the Jacobian matrix of the model thus designed
is computed. If that matrix does not have full rank, the model is discarded.
Search WWH ::




Custom Search