Smoothing Techniques for Visualisation - Data Visualization

Graphics Reference

In-Depth Information

r w

wheres r

n.heestimatecanthereforebecomputed

at a set of covariate values through the expression Sy,whereS denotes a smoothing

matrix whose rows contain the vectors v required to construct the estimate at the

points of interest.

his representation emphasises the important fact that the estimation process is

linear in the response data y. It also suggests useful analogies with standard linear

modelling techniques. In particular, the degrees of freedom associated with a linear

model can be identified as the trace of the projection matrix P which creates the

fitted values as y

(

x; h

)= (

x i

−

)

(

x i

−

x; h

)

Py. It is therefore convenient to define the approximate degrees

of freedom associated with a nonparametric estimate as ν

,whereS is the

smoothing matrix which creates the fitted values at the observed covariate values

. As the smoothing parameter h is increased, the influence of the

weight function extends across a greater range of the covariate axis and the flexibility

oftheestimateisreduced.hiscorrespondstoareductionintheapproximatedegrees

of freedom associated with the estimate.

he approximate degrees of freedom therefore provide a helpful alternative scale

on which degree of smoothness can be expressed. he estimate for year shown in

Fig. . was produced with a smoothing parameter corresponding to four degrees of

freedom, namely h

x i ; i

,...,n

. . his allows a moderate degree of flexibility in the curve

beyond the two degrees of freedom associated with a simple linear shape.

he choice of smoothing parameter h, or equivalently of the approximate degrees

offreedom ν,istherefore ofsomeimportance. Fromagraphical and exploratory per-

spective itishelpfultoplotthe estimates overawide range of smoothing parameters,

to view the effects of applying different degrees of local fitting. his is particularly ef-

fective in the form of an interactive animation. However, it is also worth considering

ways of automatically identifying suitable choices of smoothing parameter.

Someveryeffectivemethodsofdoingthishavebeendevelopedforparticulartypes

of regression problem, but other proposals have the advantage of very wide applica-

bility. One of the most popular of these has been cross-validation, where h is chosen

to minimise

i =

m i

y − i

−

(

x i

)

hesubscript on m − i indicates that the estimate is constructed fromthe dataset with

the ithobservation omitted,andsothecriterionbeingminimisedrepresentsthepre-

diction error of the estimate. here is some evidence that this approach produces

substantial variation in its selected smoothing parameters. his chapter will there-

foreuse an alternative criterion, proposedbyHurvich et al.( ),based on Akaike's

information criterion (AIC). his chooses h to minimise

(

)

log

RSS

( . )

(

−

)

i =

and,asdescribed

whereRSSdenotes the residual sum-of-squares

y i

−

(

x i

)

above, ν

.Ingeneral,thismethodoffersaveryusefulandusuallyveryeffective

Data Visualization

Search WWH ::

Custom Search

Home