Information Technology Reference
In-Depth Information
must be used ([Monari et al. 2000] the same result is derived in [Seber et
al. 1989], albeit with an incorrect proof). The following approximation of the
least-squares solution w LS is found:
w LS = w +( Z T Z ) 1 Z T [ y p
g ( x , w )] .
That result is approximate for a nonlinear model, but is exact for a lin-
ear model: in the case of a linear model, Z is the matrix of observations
Ξ ,and g ( x , w )= Ξw . Then one gets w LS = w +( Ξ T Ξ ) 1 Ξ T y p
( Ξ T Ξ ) 1 Ξ T Ξw )=( Ξ T Ξ ) 1 Ξ T y p , which is the exact result, as shown in
the section devoted to the training of linear models.
2.6.3.2 The Effect of Withdrawing an Example on the Model
The results of the previous section are useful for estimating the effect, on the
predictions of the model, of withdrawing an example from the training set.
As defined in the section on leave-one-out, we use the superscript (
k ) for all
quantities related to a model that was designed after withdrawing example k
from the training set; the quantities that have no superscript are related to
models whose training was performed with all available data.
The Effect of Withdrawing an Example on the Prediction: Virtual
Leave-One-Out
Assuming that the withdrawal of example k has a small effect on the least-
squares solution, the relation that was derived in the previous section can be
used to compute the vector of the parameters of the model that is trained
with the training set deprived of example k , as a function of the vector of the
parameters of the model trained with the whole data set,
r k
( Z T Z ) 1 z k
w ( −k )
LS
= w LS
,
1
h kk
where z k is the vector whose components are the k th column of the Jacobian
matrix Z,r k is the predication error (or residual) on example k when the latter
belongs to the training set
f ( x k , w LS ) ,
r k = y pk
and where h kk = z kT ( Z T Z ) 1 z k is the leverage of example k [Lawrance
1995]. Geometrically, h kk is the k th component of the projection, onto solution
subspace, of the unit vector borne by axis k . Since these quantities are the
diagonal elements of an orthogonal projection matrix, they obey the following
relations:
N
h kk = q,
0 <h kk < 1 .
k =1
Search WWH ::




Custom Search