Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

r ( −k )

k

2

N

1

N

E t =

k =1

is computed. As in the case of cross-validation, models of increasing complexi-

ties are designed, until the leave-one-out score starts increasing with increasing

complexity.

The main drawback of the leave-one-out technique is that it is computa-

tionally very demanding, but it can be shown that the leave-one-out score is

an unbiased estimator of the generalization error [Vapnik 1995].

In the next section, we discuss a slightly different technique, whose com-

putation time is roughly the computation time of leave-one-out divided by a

factor N (the number of examples). It is based on the fact that the withdrawal

of an example from the training set should not lead to a very different model,

so that a model that is locally linear in parameter space , in the neighborhood

of the minimum of the cost function, can be designed; therefore, powerful

results from the theory of linear regression can be taken advantage of.

2.6.3 Local Least Squares: Effect of Withdrawing an Example

from the Training Set, and Virtual Leave-One-Out

In the present section, we show that the effect of withdrawing an example from

the training set on a nonlinear model can be predicted. Specifically, we prove

that the modeling error made by the model on the withdrawn example can

be accurately predicted without actually withdrawing the example (virtual

leave-one-out), and that a confidence interval on the predictions of the model

can be estimated. Finally, we show that the influence of an observation on

the model can be summarized with a single parameter: the leverage of the

observation.

2.6.3.1 Local Approximation of the Least Squares Method

Consider a model g ( x , w ∗ ). A first-order Taylor expansion of the model, in

parameter space, in the neighborhood of w ∗ , can be written as

g ( x , w ) = g ( x , w ∗ )+ Z ( w

w ∗ ) ,

−

where g is the vector of the N predictions of the model, and where Z is

the Jacobian matrix of the model, as defined above. That model is linear

with respect to its parameters, and matrix Z is equivalent to the matrix of

observations.

In order to derive a local approximation, to first order in w

w ∗ ,ofthe

gradient of the least-squares cost function, a second-order approximation of

the cost function, hence a second-order approximation of the model output,

−

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home