Information Technology Reference
In-Depth Information
r ( −k )
k
2
N
1
N
E t =
k =1
is computed. As in the case of cross-validation, models of increasing complexi-
ties are designed, until the leave-one-out score starts increasing with increasing
complexity.
The main drawback of the leave-one-out technique is that it is computa-
tionally very demanding, but it can be shown that the leave-one-out score is
an unbiased estimator of the generalization error [Vapnik 1995].
In the next section, we discuss a slightly different technique, whose com-
putation time is roughly the computation time of leave-one-out divided by a
factor N (the number of examples). It is based on the fact that the withdrawal
of an example from the training set should not lead to a very different model,
so that a model that is locally linear in parameter space , in the neighborhood
of the minimum of the cost function, can be designed; therefore, powerful
results from the theory of linear regression can be taken advantage of.
2.6.3 Local Least Squares: Effect of Withdrawing an Example
from the Training Set, and Virtual Leave-One-Out
In the present section, we show that the effect of withdrawing an example from
the training set on a nonlinear model can be predicted. Specifically, we prove
that the modeling error made by the model on the withdrawn example can
be accurately predicted without actually withdrawing the example (virtual
leave-one-out), and that a confidence interval on the predictions of the model
can be estimated. Finally, we show that the influence of an observation on
the model can be summarized with a single parameter: the leverage of the
observation.
2.6.3.1 Local Approximation of the Least Squares Method
Consider a model g ( x , w ). A first-order Taylor expansion of the model, in
parameter space, in the neighborhood of w , can be written as
g ( x , w ) = g ( x , w )+ Z ( w
w ) ,
where g is the vector of the N predictions of the model, and where Z is
the Jacobian matrix of the model, as defined above. That model is linear
with respect to its parameters, and matrix Z is equivalent to the matrix of
observations.
In order to derive a local approximation, to first order in w
w ,ofthe
gradient of the least-squares cost function, a second-order approximation of
the cost function, hence a second-order approximation of the model output,
 
Search WWH ::




Custom Search