Database Reference
In-Depth Information
• Measures of central tendency, such as mean, median, and mode
• Measures of variation, such as standard deviation and variance
Aside from the median and mode, all of these can be calculated easily by invoking
aggregate functions:
mysql>
SELECT COUNT(score) AS n,
->
SUM(score) AS sum,
->
MIN(score) AS minimum,
->
MAX(score) AS maximum,
->
AVG(score) AS mean,
->
STDDEV_SAMP(score) AS 'std. dev.',
->
VAR_SAMP(score) AS 'variance'
->
FROM testscore;
+----+------+---------+---------+--------+-----------+----------+
| n | sum | minimum | maximum | mean | std. dev. | variance |
+----+------+---------+---------+--------+-----------+----------+
| 20 | 146 | 4 | 10 | 7.3000 | 1.8382 | 3.3789 |
+----+------+---------+---------+--------+-----------+----------+
The
STDDEV_SAMP()
and
VAR_SAMP()
functions produce sample measures rather than
population measures. That is, for a set of
n
values, they produce a result that is based on
n
-1 degrees of freedom. For the population measures, which are based on
n
degrees of
freedom, use
STDDEV_POP()
and
VAR_POP()
instead.
STDDEV()
and
VARIANCE()
are
synonyms for
STDDEV_POP()
and
VAR_POP()
.
Standard deviation can be used to identify outliers—values that are uncharacteristically
far from the mean. For example, to select values that lie more than three standard de‐
viations from the mean, do this:
SELECT
@
mean
:
=
AVG
(
score
),
@
std
:
=
STDDEV_SAMP
(
score
)
FROM
testscore
;
SELECT
score
FROM
testscore
WHERE
ABS
(
score
-@
mean
)
>
@
std
*
3
;
MySQL has no built-in function for computing the mode or median of a set of values,
but you can compute them yourself. To determine the mode (the value that occurs most
frequently), count each value and see which is most common:
mysql>
SELECT score, COUNT(score) AS frequency
->
FROM testscore GROUP BY score ORDER BY frequency DESC;
+-------+-----------+
| score | frequency |
+-------+-----------+
| 9 | 5 |
| 6 | 4 |
| 7 | 4 |
| 4 | 2 |
| 8 | 2 |
| 10 | 2 |
| 5 | 1 |
+-------+-----------+