Databases Reference
In-Depth Information
packages include bar charts, pie charts, and line graphs. Other popular displays of data
summaries and distributions include
quantile plots
,
quantile-quantile plots
,
histograms
,
and
scatter plots
.
2.2.1
MeasuringtheCentralTendency:Mean,Median,andMode
In this section, we look at various ways to measure the central tendency of data. Suppose
that we have some attribute
X
, like
salary
, which has been recorded for a set of objects.
Let
x
1
,
x
2
,
,
x
N
be the set of
N
observed values or
observations
for
X
. Here, these val-
ues may also be referred to as the data set (for
X
). If we were to plot the observations
for
salary
, where would most of the values fall? This gives us an idea of the central ten-
dency of the data. Measures of central tendency include the mean, median, mode, and
midrange.
The most common and effective numeric measure of the “center” of a set of data is
the
(arithmetic) mean
. Let
x
1
,
x
2
,
:::
,
x
N
be a set of
N
values or
observations
, such as for
some numeric attribute
X
, like
salary
. The
mean
of this set of values is
:::
X
x
i
N
D
x
1
C
x
2
CC
x
N
N
i
D1
N
x
D
.
(2.1)
This corresponds to the built-in aggregate function,
average
(
avg()
in SQL), provided in
relational database systems.
Example2.6
Mean.
Suppose we have the following values for
salary
(in thousands of dollars), shown
in increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. Using Eq. (2.1), we have
30C36C47C50C52C52C56C60C63C70C70C110
12
N
x
D
696
12
D 58.
Thus, the mean salary is $58,000.
D
,
N
.
The weights reflect the significance, importance, or occurrence frequency attached to
their respective values. In this case, we can compute
Sometimes, each value
x
i
in a set may be associated with a weight
w
i
for
i
D 1,
:::
X
w
i
x
i
w
1
x
1
C
w
2
x
2
CC
w
N
x
N
w
1
C
w
2
CC
w
N
i
D1
N
x
D
D
.
(2.2)
X
w
i
i
D1
This is called the
weighted arithmetic mean
or the
weighted average
.