Information Technology Reference
In-Depth Information
Thus, the confidence intervals on the prediction of the model involve the same
quantities h kk (leverages) as the prediction of the effect of the withdrawal of
an example from the training set. That is not surprising since both groups of
relations arise from a Taylor expansion of the output of the model.
The confidence interval on the prediction of an example that is with-
drawn from the training set can also be estimated: given an input vector
x k , the approximate confidence interval on the prediction of that example is
given by
s ( −k ) h kk
1 −h kk
E ( −k ) ( Y p
x k )
g ( x k , w LS )
t N−q− 1
α
|
±
[Seber et al. 1989]. In general, s ( −k ) can be approximated by s .
Interpretation of the Leverages
The leverages are the diagonal elements of an orthogonal projection matrix:
they sum to the dimension of that matrix. In the present case, the orthogonal
projection is onto the solution subspace, hence its dimension is equal to the
number of parameters of the model: therefore, the sum of the leverages is
equal to the number of the degrees of freedom of the model. That property
can also be expressed as follows: the leverage of example k is the fraction of
the degrees of freedom used for fitting example k [Monari et al. 2000, 2002].
Some specific cases are of interest:
If all leverages are equal, they are equal to q/N : a fraction q/N of the
parameters of the model is devoted to each example, and all examples
have the same influence on the model: such a model should not exhibit
overfitting since it is not “focused” on any example. That property can be
used with advantage for model selection, as shown below.
If a leverage is equal to zero, the model does not devote any degree of
freedom to example k . That has a simple geometric interpretation: h kk is
the k
th component of the projection, onto solution subspace, of the unit
vector borne by axis k in observation space; if that axis is orthogonal to
the solution subspace, example k does not contribute to the model output,
which lies in solution subspace (see Fig. 2.5); therefore, it has no influence
on the parameters of the model. Whether that example is in the training
set or has been withdrawn from it, the prediction of that example has the
same error, as evidenced by relation: r ( −k )
k
h kk ). The confidence
interval on that prediction is zero; the prediction of the model is certainly
equal to the expectation value of the quantity of interest.
The fact that the confidence interval is equal to zero does not mean that the
prediction of the corresponding point is exact. It is not contradictory with
the fact that the prediction error r k is not zero: the prediction error is the
difference between the measured value and the predicted value : it contains both
the modeling error (difference between the predicted value and the unknown
= r k / (1
 
Search WWH ::




Custom Search