Databases Reference
In-Depth Information
to assess both the overall behavior and unusual occurrences). Second, it plots quantile
information (see Section 2.2.2). Let
x
i
, for
i
D 1 to
N
, be the data sorted in increasing
order so that
x
1
is the smallest observation and
x
N
is the largest for some ordinal or
numeric attribute
X
. Each observation,
x
i
, is paired with a percentage,
f
i
, which indicates
that approximately
f
i
100% of the data are below the value,
x
i
. We say “approximately”
because there may not be a value with exactly a fraction,
f
i
, of the data below
x
i
. Note
that the 0.25 percentile corresponds to quartile
Q
1
, the 0.50 percentile is the median,
and the 0.75 percentile is
Q
3
.
Let
i
0.5
N
f
i
D
.
(2.7)
1
These numbers increase in equal steps of 1
2
N
(which is slightly
above 0) to 1
2
N
(which is slightly below 1). On a quantile plot,
x
i
is graphed against
f
i
. This allows us to compare different distributions based on their quantiles. For exam-
ple, given the quantile plots of sales data for two different time periods, we can compare
their
Q
1
, median,
Q
3
, and other
f
i
values at a glance.
=
N
, ranging from
Example2.13
Quantile plot.
Figure 2.4 shows a quantile plot for the
unit price
data of Table 2.1.
Quantile-QuantilePlot
A
quantile-quantile plot
, or
q-q plot
, graphs the quantiles of one univariate distribution
against the corresponding quantiles of another. It is a powerful visualization tool in that it
allows the user to view whether there is a shift in going from one distribution to another.
Suppose that we have two sets of observations for the attribute or variable
unit price
,
taken from two different branch locations. Let
x
1
,
:::
,
x
N
be the data from the first
branch, and
y
1
,
,
y
M
be the data from the second, where each data set is sorted in
increasing order. If
M
D
N
(i.e., the number of points in each set is the same), then we
simply plot
y
i
against
x
i
, where
y
i
and
x
i
are both
:::
.
i
0.5
/=
N
quantiles of their respec-
tive data sets. If
M
N
(i.e., the second branch has fewer observations than the first),
there can be only
M
points on the q-q plot. Here,
y
i
is the
<
.
i
0.5
/=
M
quantile of the
y
140
120
100
80
60
40
20
0
0.00
Q
3
Median
Q
1
0.25
0.50
0.75
1.00
f
-value
Figure2.4
A quantile plot for the unit price data of Table 2.1.