Graphics Reference
In-Depth Information
r w
wheres r
n.heestimatecanthereforebecomputed
at a set of covariate values through the expression Sy,whereS denotes a smoothing
matrix whose rows contain the vectors v required to construct the estimate at the
points of interest.
his representation emphasises the important fact that the estimation process is
linear in the response data y. It also suggests useful analogies with standard linear
modelling techniques. In particular, the degrees of freedom associated with a linear
model can be identified as the trace of the projection matrix P which creates the
fitted values as y
(
x; h
)= (
x i
x
)
(
x i
x; h
)
Py. It is therefore convenient to define the approximate degrees
of freedom associated with a nonparametric estimate as ν
=
,whereS is the
smoothing matrix which creates the fitted values at the observed covariate values
tr
S
=
. As the smoothing parameter h is increased, the influence of the
weight function extends across a greater range of the covariate axis and the flexibility
oftheestimateisreduced.hiscorrespondstoareductionintheapproximatedegrees
of freedom associated with the estimate.
he approximate degrees of freedom therefore provide a helpful alternative scale
on which degree of smoothness can be expressed. he estimate for year shown in
Fig. . was produced with a smoothing parameter corresponding to four degrees of
freedom, namely h
x i ; i
=
,...,n
. . his allows a moderate degree of flexibility in the curve
beyond the two degrees of freedom associated with a simple linear shape.
he choice of smoothing parameter h, or equivalently of the approximate degrees
offreedom ν,istherefore ofsomeimportance. Fromagraphical and exploratory per-
spective itishelpfultoplotthe estimates overawide range of smoothing parameters,
to view the effects of applying different degrees of local fitting. his is particularly ef-
fective in the form of an interactive animation. However, it is also worth considering
ways of automatically identifying suitable choices of smoothing parameter.
Someveryeffectivemethodsofdoingthishavebeendevelopedforparticulartypes
of regression problem, but other proposals have the advantage of very wide applica-
bility. One of the most popular of these has been cross-validation, where h is chosen
to minimise
=
n
i =
.
m i
y i
(
x i
)
hesubscript on m i indicates that the estimate is constructed fromthe dataset with
the ithobservation omitted,andsothecriterionbeingminimisedrepresentsthepre-
diction error of the estimate. here is some evidence that this approach produces
substantial variation in its selected smoothing parameters. his chapter will there-
foreuse an alternative criterion, proposedbyHurvich et al.( ),based on Akaike's
information criterion (AIC). his chooses h to minimise
(
ν
+
)
log
RSS
n
( . )
(
)+
+
(
n
ν
)
n
i =
m
and,asdescribed
whereRSSdenotes the residual sum-of-squares
y i
(
x i
)
above, ν
=
tr
S
.Ingeneral,thismethodoffersaveryusefulandusuallyveryeffective
Search WWH ::




Custom Search