Information Technology Reference
In-Depth Information
Among models that have the same complexity, find the model that achieves
the best bias-variance tradeoff.
Among the best models that have different complexities, find the model
that achieves the best bias-variance tradeoff.
All techniques that will be discussed below aim at (i) discarding models that
are obviously prone to overfitting, and (ii) at estimating the generalization
error (or theoretical cost function) in order to find the model that has the
smallest generalization error. As a preliminary step, we show how to discard
models that are prone to overfitting; subsequent sections will discuss two
model selection techniques,
a global method, which consists in estimating the generalization error:
cross-validation;
a local method whereby the influence of each example on the model is es-
timated: the local overfitting control via leverages (LOCL) method, which
is based on the estimation of leverages and confidence intervals for the
predictions of the model.
Finally, the above approaches will be combined into a complete model selection
methodology for the selection of nonlinear models.
2.6.1 Preliminary Step: Discarding Overfitted Model by
Computing the Rank of the Jacobian Matrix
2.6.1.1 Introduction
In the section devoted to the estimation of the parameters of a model that
is linear with respect to its parameters, we have defined the matrix of obser-
vations Ξ ; each column of that matrix has N elements, which are the values
of a given variable for each example. Therefore, for a model with n variables,
the matrix of observations is ( N , n ). For a model that is not linear with re-
spect to its parameters, having a vector of q parameters w LS , the equivalent
of the observation matrix is the Jacobian matrix Z ( N,q ); each column z i of
that matrix has N elements, which are the values of the partial derivatives
of the output with respect to a given parameter: z i =( ∂g ( x ,w ) /∂w i ) w = w IS .
It can easily be checked that, for a model that is linear with respect to its
parameters, the Jacobian matrix Z is identical to the observation matrix Ξ .
Thus, each column of the Jacobian matrix expresses the effect of the vari-
ation of a parameter on the output of the model. If the Jacobian matrix does
not have full rank (i.e., if its rank is not equal to q ), it can be concluded that
the effect, on the model output, of two parameters (or more) are not indepen-
dent. Therefore, there exist under-determined parameters in the model: the
latter has too many parameters, hence its variance is certainly too large. Such
a model should be discarded. Moreover, rank deficiency has an adverse effect
on training [Saarinen et al. 1993] [Zhou et al. 1998].
Search WWH ::




Custom Search