Graphics Reference
In-Depth Information
Introductory Notes
2.1
Howdo we find structure in multidimensional data when computer screens are only
two-dimensional? One approach is to project the data onto one or two dimensions.
Projections areusedinclassicalstatistical methodslikeprincipalcomponentanal-
ysis (PCA) and linear discriminant analysis. PCA (e.g., Johnson and Wichern )
chooses a projection to maximize the variance. Fisher's linear discriminant (e.g.,
Johnson and Wichern ) chooses a projection that maximizes the relative sep-
aration between group means. Projection pursuit (PP) (e.g., Huber )generalizes
these ideas into a common strategy, where an arbitrary function on projections is
optimized. he scatterplot matrix (e.g., Becker and Cleveland ) also can be con-
sidered to be a projection method. It shows projections of the data onto all pairs of
coordinate axes, the -D marginal projections of the data. hese projection methods
choose a few select projections out of infinitely many.
Whatishiddenfromtheuserwhoviewsonlyafewstatic projections? herecould
be a lot. he reader may be familiar with an ancient fable from India about the blind
menandtheelephant.Onegrabbedhistailandsworethecreaturewasarope.An-
otherfelttheelephant'searandyelleditwasahandfan.Yetanothergrabbedhistrunk
and exclaimed he'd found a snake. hey argued and argued about what the elephant
was, until a wise man settled the fight. hey were all correct, but each described dif-
ferent parts of the elephant. Looking at a few static projections of multivariate data is
like the blind men feeling parts of the elephant and inferring the nature of the whole
beast.
Howcanamoresystematicpresentationofallpossibleprojectionsbeconstructed?
Static projections can be strung together into a movie using interpolation meth-
ods, providing the viewer with an overview of multivariate data. hese interpolation
methods are commonly called tours. hey provide a general approach to choose and
view data projections, allowing the viewer to mentally connect disparate views, and
thus supporting the exploration of a high-dimensional space. We use tours to ex-
ploremultivariatedatalikewemightexploreanewneighborhood:walkrandomly
todiscover unexpectedsights, employ a guide,orguide ourselves using a map.hese
modes of exploration are matched by three commonly available types of tours. hey
are the tours available in the sotware, GGobi (Swayne et al., ), which is used in
this chapter to illustrate the methods.
In the grand tour, we walk randomly around the landscape discovering unex-
pected sights - the grand tour shows all projections of the multivariate data. his
requires time and we may spend a lot of time wandering around boring places
and miss the highlights.
Using a PP guided tour, we employ a tour guide who takes us to the features that
they think are interesting. We improve the probability of stopping by the inter-
esting sights by selecting more views that are interesting based on a PP index.
Manual control takes the steering wheelback fromthe guide, enabling the tourist
to decide on the next direction. We choose a direction by controlling the projec-
Search WWH ::




Custom Search