Graphics Reference
In-Depth Information
Graph-theoretic Analytics
5.5
Some graph-analytic procedures naturally lend themselves to visualization or are
based on geometric graphs. We discuss a few in this section.
Scagnostics
5.5.1
Ascatterplot matrix, variously called aSPLOMorcasement plotordratman's plot,is
a (usually) symmetric matrix of pairwise scatterplots. An easy way to conceptualize
asymmetric SPLOMistothink ofacovariance matrix of p variables andimaginethat
each off-diagonal cell consists of a scatterplot of n cases rather than a scalar number
representing a single covariance. his display was first published by John Hartigan
( ) and was popularized by Tukey and his associates at Bell Laboratories.
Large scatterplot matrices become unwieldy when there are many variables. First
of all, the visual resolution of the display is limited when there are many cells. his
defect can be ameliorated by pan and zoom controls. More critical, however, is the
multiplicity problem in visual exploration. Looking for patterns in p
scat-
terplotsis impractical formorethan variables. his problemiswhat promptedthe
Tukeys' solution.
heTukeys reducedanO(p )visualtasktoanO(k )visualtask,wherek isasmall
number of measuresof the distribution of a -Dscatter of points. hesemeasures in-
cluded the area of the peeled convex hull of the -D point scatters, the perimeter
length of this hull, the area of closed -Dkernel density isolevel contours, the perim-
iter length of these contours, the convexity of these contours, a modality measure of
the -D kernel densities, a nonlinearity measure based on principal curves fitted to
the -D scatterplots, the median nearest-neighbor distance between points, and sev-
eralothers.Byusingthesemeasures,theTukeysaimedtodetectanomalies indensity,
distributional shape, trend, and other features in -D point scatters.
Ater calculating these measures, the Tukeys constructed a scatterplot matrix of
the measures themselves, in which each point in the scagnostic SPLOM represented
a scatterplot cell in the original data SPLOM. With brushing and linking tools, un-
usual scatterplots could be identified from outliers in the scagnostic SPLOM.
Wilkinson et al. ( ) extended this procedure using proximity graphs. his ex-
tension improved scalability, because the graph calculations are O
(
p
)
,andal-
lowed the method to be applied to categorical and continuous variables. Wilkinson
et al. ( ) developed nine scagnostics measures: Outlying, Skewed, Clumpy, Con-
vex, Skinny, Striated, Stringy, Straight and Monotonic.
Figure . shows the output of the program developed in Wilkinson et al. ( ).
he dataset used in the example is the Boston housing data cited in Breiman et al.
( ). he let SPLOM shows the data. he larger scagnostics SPLOM in the mid-
dle of the figure shows the distribution of the nine scagnostics. One point is high-
lighted. his point is an especially large value on the Outlying scagnostic statistic.
Its corresponding scatterplot is shown in the upper-right plot superimposed on the
scagnostics SPLOM.his plot involves a dummy variable for whether a tract bounds
(
nlog n
)
Search WWH ::




Custom Search