Graphics Reference
In-Depth Information
Density estimation and related methods provide a powerful set of tools for visual-
ization of data-based distributions in one, two, and higher dimensions. his chapter
examines a variety of such estimators, as well as the various issues related to their
theoretical quality and practical application.
hegoal ofunderstanding data leadstothenotion ofextracting asmuchinforma-
tion as possible. hisbegins by understanding how each individual variable varies. If
the parametric form is well known, then a few statistics answer the question. How-
ever, if the parametric form is unknown, then visual examination of a well-con-
structed nonparametric density estimate is the recommended route. In this way, fea-
tures of the density, such as the number and location of modes, can be identified.
With multivariate density estimates, wecan extendthis ability to understand the dis-
tributional relationships between variables in two or more dimensions.
Univariate Density Estimates
5.1
Given a univariate sample
of an unknown parametric form, vi-
sualization ofanestimateof f isanimportant partoftheanalysis processfor multiple
reasons. It allows for direct examination of possibly important structure in f ,such
as skewness or multiple modes. In addition, it provides for a means of considering
assumptions such as that of normality for further analysis. Such visualization can
provide an alternative to a formal goodness-of-fit test, particularly for large sample
sizes where such tests may reject even quite reasonable models.
x ,...,x n
f
(
x
)
Histograms
5.1.1
he form of density estimation most familiar to analysts is the univariate histogram.
While it owes its popularity to its simplicity, the histogram can provide a serviceable,
if crude, idea of a dataset's distribution.
Construction of a histogram requires a mesh of bin edges,
t
<
t
<
t
<
...
<
t k
, covering the range of the data. Define the jth bin as B j
=[
t j , t j +
)
,itswidthas
n
i = I B j
h j
=
t j +
t j ,anditscountasν j
=
(
x i
)
,whereI A
(
x
)
isanindicator function,
taking if x
A and otherwise. A frequency histogram plots
g
(
x
)=
ν j
x
B j ,
( . )
while a percentage histogram plots
ν j
n
g
(
x
)=
x
B j .
( . )
As long as the widths for all bins equal a constant h j
h, either of these will give
a reasonable visual depiction. But as neither integrates to , neither is a true density
estimate. If bin widths vary, neither frequency nor percentage histograms should be
used, as they will give excessive visual weight to the larger bins.
=
Search WWH ::




Custom Search