Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

must be used ([Monari et al. 2000] the same result is derived in [Seber et

al. 1989], albeit with an incorrect proof). The following approximation of the

least-squares solution w LS is found:

w LS = w ∗ +( Z T Z ) − 1 Z T [ y p −

g ( x , w ∗ )] .

That result is approximate for a nonlinear model, but is exact for a lin-

ear model: in the case of a linear model, Z is the matrix of observations

Ξ ,and g ( x , w ∗ )= Ξw ∗ . Then one gets w LS = w ∗ +( Ξ T Ξ ) − 1 Ξ T y p −

( Ξ T Ξ ) − 1 Ξ T Ξw ∗ )=( Ξ T Ξ ) − 1 Ξ T y p , which is the exact result, as shown in

the section devoted to the training of linear models.

2.6.3.2 The Effect of Withdrawing an Example on the Model

The results of the previous section are useful for estimating the effect, on the

predictions of the model, of withdrawing an example from the training set.

As defined in the section on leave-one-out, we use the superscript (

k ) for all

quantities related to a model that was designed after withdrawing example k

from the training set; the quantities that have no superscript are related to

models whose training was performed with all available data.

−

The Effect of Withdrawing an Example on the Prediction: Virtual

Leave-One-Out

Assuming that the withdrawal of example k has a small effect on the least-

squares solution, the relation that was derived in the previous section can be

used to compute the vector of the parameters of the model that is trained

with the training set deprived of example k , as a function of the vector of the

parameters of the model trained with the whole data set,

r k

( Z T Z ) − 1 z k

w ( −k )

LS

= w LS −

,

1

−

h kk

where z k is the vector whose components are the k th column of the Jacobian

matrix Z,r k is the predication error (or residual) on example k when the latter

belongs to the training set

f ( x k , w LS ) ,

r k = y pk −

and where h kk = z kT ( Z T Z ) − 1 z k is the leverage of example k [Lawrance

1995]. Geometrically, h kk is the k th component of the projection, onto solution

subspace, of the unit vector borne by axis k . Since these quantities are the

diagonal elements of an orthogonal projection matrix, they obey the following

relations:

N

h kk = q,

0 <h kk < 1 .

k =1

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home