Graphics Reference
In-Depth Information
data may be quite di cult. he first step is examining the univariate densities of all
the individual variables f
, followed by understanding how pairs of variables co-
vary by the examination of all bivariate density plots, f
(
x
)
.
Just as the univariate estimates can only hint at the actual bivariate structure in
data, so too bivariate estimates can only hint at the trivariate structure, and so on.
hree- and -D kernel density estimates are quite feasible theoretically but should be
approached with caution for several reasons. he choices of kernel and bandwidth
become more complicated, although a product kernel and diagonal bandwidth ma-
trix will oten be reasonable. Data becomes thinner at higher dimensions, so larger
sample sizes are required to get reasonable results. Most importantly, plotting of the
full estimate would require four and five dimensions, respectively, as a perspective
plot requires one more dimension than the number of variables.
Clearly,onemustsetasideanythoughtofexaminingtheentirefunction, f
x, y
(
)
(
x, y, z
)
or f
. However, note that a bivariate contour plot only requires two dimen-
sions. Likewise, a -D density estimate may still be examined as a -D contour plot.
A single contour slice is a level set of the density, for example,
(
x, y, z, t
)
f
α f max
S α
= (
x, y, z
)
(
x, y, z
)=
( . )
where f max isthelargestvalueofthedensityandα rangesfrom to .Fornormal
data, the contour isan ellipse (orsphere).When α
=
, S isthemode.As α decreases,
thesizeoftheellipseincreases.
Of course,a contour plot of a bivariate density is not complete if only one contour
level is displayed. Likewise, a trivariate density requires examination of at least three
to five levels, depending upon the complexity of the estimated density. his task will
require some way to see “through” the individual contours: either transparency, or
some method of removing part or all of outer contours so that inner ones may be
viewed. Some care should be taken here to carefully distinguish different levels, and
possibly upper and lower regions on each side of the contours as well, as otherwise
opposite features such as modes and holes may be visually indistinguishable.
Whendealing with fourormorevariables, one'soptions aremorelimited.hebest
choiceappears tobe toexamine the conditional -Destimates as above, for aseries of
selected“slices”inthefourthvariable. Forexample,aseriesofvalues t
<
t
<
...
<
t k
are selected and contour shells of the -D arrays
f
(
x, y, z, t
)
=
, ,...,k
( . )
are displayed. If hardware permits, k maybe taken as quite large and the sequence of
view may be animated. he animation is effective because the density contours vary
smoothly, whichthe human brain can easily decode.hisapproachissimilar tosome
of the ideas found in trellis coplots (Cleveland, ;Becker, et al., ).
Toillustrate these ideas, thefullPRISMdataset forthecontinental United States is
employed. here are grid points, each representing about square miles. he
variablesusedareelevation,precipitation,andmaximumtemperatureforthemonths
December-February averaged over the period - . As the variables are quite
Search WWH ::




Custom Search