Graphics Reference
In-Depth Information
plots. hese authors have also noted the di culties that arise due to the overplotting
of characters designed to represent a single observation.
his phenomenon, referred to as “too much ink” by Tute ( ), has given rise to
the use of more formal density estimators, in particular the histogram and the ker-
nel density estimator, to visualize bivariate distributions and relationships in data.
Bivariate density estimates remain straightforward to calculate but require more so-
phisticated visualization techniques to plot.
Bivariate Histograms
5.2.1
Bivariate histograms share many of the strengths and weaknesses of their univariate
cousins. hey remain simple to calculate, with the computational effort being pri-
marily focused on determining the counts of the observations in what are now -D
bins. Bin size and location issues remain important, although there is also the addi-
tional issue of the shape of the bivariate bins. Again, one should examine multiple
examples when possible to discover what are true features and what are artifacts of
the histogram mesh.
For arectangular mesh,an asymptotic analysis of the MISE,similar totheunivari-
ate case, shows that the optimal size of the edges of the bivariate bins is proportional
to n . Assuming uncorrelated and normally distributed data gives rise to the nor-
malreference rule h
k
. σ k n for k
, andwhere σ k isthe standard deviation
for the kth variable. See Scott ( ) for details.
Of particular interest in the construction of bivariate histograms is the notion of
an optimal bin shape. hree types of regular bins are possible: rectangular, trian-
gular, and hexagonal. Scott ( ) compared the three shapes and showed that the
hexagonal bins offer a slight improvement in asymptotic MISE when compared to
rectangular bins; triangular bins were substantially worse than the other two. Carr
et al. ( ) also suggest using hexagonal bins, although from a different perspective.
In particular, they show that, when using some sort of glyph representing the fre-
quency for each bin, a rectangular mesh resulted in a type of visual artifact from the
vertical and horizontal alignment; the hexagonal bin structure has a much improved
visual appeal.
Displaying bivariate histograms can be accomplished in a number of ways. A tra-
ditional bivariate histogram with arectangular meshcan bedisplayed using atype of
-D bar chart. Oten structure in the histograms of this type can be obscured by the
viewing angle. Rotating the display and viewing the histogram at different viewing
angles can reveal this hidden structure. In a non-interactive setting, such operations
are of course unavailable. An alternative for displaying bivariate histograms is the
use of the so-called image plot in which color is used to represent the frequency or
density of points in each bin.
An example demonstrating the need and benefit of bivariate histograms is shown
in Fig. . .In the let frame,a scatterplot is shown using the average winter midpoint
temperature and the log-transformed total precipitation for the Colorado PRISM
data. he scatterplot in Fig. . clearly demonstrates the phenomenon of “too much
ink” and little of the structure in the data can be discerned fromthe plot. In the right
=
=
Search WWH ::




Custom Search