Graphics Reference
In-Depth Information
fectofbinwidthsuggeststhatforvisualization purposes,itiswisetoviewacollection
of histograms with more than one choice of bin width; a particular sequence recom-
mended in practice is h
. k for k
, ,...,untilh is obviously too narrow.
he exact locations of the bin edges does not enter into the expression for h be-
cause the bin edge is generally theoretically less important than the bin width. he
exception occurs when f has a known point of discontinuity. Placing a bin edge at
such a point will provide a simple, automatic adjustment. As is frequently a point
of discontinuity, selecting bin edges as t
=
h S
=
jh, is oten a wise choice. Despite
the lower-order theoretical effect, placement of bin edges can have substantial visual
effect on histograms of moderate sample size. As with bin width, inspecting multiple
histograms with different bin edges is highly recommended. A JAVA applet is avail-
ableintheRiceVirtualLabinStatisticstoexploretheinteractionofbinwidthandbin
edge selections; see the histogram applet at http://www.ruf.rice.edu/~lane/rvls.html.
Features that appear in most or all alternative histogram views should be given much
more credence than those that appear only once.
=
, t j
=
Improved Binned Density Estimates
5.1.2
Visual and theoretical improvement can beobtained indensity estimates fromhisto-
gram-style binned data through the use of interpolation. he oldest such technique
is the frequency polygon, generated by linearly interpolating histogram bin centers.
By this means, the slope of f maybetracked,andthebiasimprovesfromO
(
h
)
,
depending on f
h
, depending on f ′′
(
x
)
,toO
(
)
(
x
)
; details may be found in Scott
( ).
he edge frequency polygon of Jones et al. ( ), instead interpolates the his-
togram bin edges, at heights representing the averages of adjacent histogram bin
heights. heresult reduces variance and optimal MISE, at the cost of asmall increase
in bias.
Minnotte's ( )biased-optimized frequencypolygoninterpolateshistogrambin
centers, but at heights calculated to ensure that the multinomial probabilities repre-
sented by the bin data proportions are maintained. Although the estimates may go
negative, and have higher optimal MISE properties than the edge frequency poly-
gon, their minimal bias recommends them in cases where large amounts of data are
collected into coarse bins and no finer binning is possible. hese can be improved
still furtherbyinterpolating with cubicorhigher-ordersplines.heresulting higher-
order histosplines (Minnotte, ) achieve O
h
or higher levels of bias, and can
strongly outperform other estimates when large samples are prebinned into wide
bins.
Figure . shows examples of the standard frequency polygon, edge frequency
polygon,bias-optimizedfrequencypolygon,andcubichigher-orderhistosplinecom-
puted from the histogram information of Fig. . .
he effects of the bin origin nuisance parameter can be minimized by comput-
ing several histograms, each with the same bin width, but with different bin ori-
gins. Scott's ( ) averaged shited histogram (ASH) is a weighted average of m his-
tograms, f , f ,...,f m , all constructed with the same bin width, h,butwithshited
(
)
Search WWH ::




Custom Search