Statistical Techniques - MySQL

Database Reference

In-Depth Information

-> FROM rainfall AS t1 INNER JOIN rainfall AS t2

-> ON t1.date >= t2.date

-> GROUP BY t1.date;

+------------+--------------+-------------+--------------+-------------+

+------------+--------------+-------------+--------------+-------------+

| 2014-06-01 | 1.50 | 1.50 | 1 | 1.500000 |

| 2014-06-03 | 0.50 | 2.00 | 2 | 1.000000 |

| 2014-06-05 | 1.00 | 3.00 | 3 | 1.000000 |

+------------+--------------+-------------+--------------+-------------+

To fix the problem, determine the number of days elapsed a different way. Take the

minimum and maximum date involved in each sum and calculate a days-elapsed value

from them:

DATEDIFF(MAX(t2.date),MIN(t2.date)) + 1

That value must be used for the days-elapsed column and for computing the running

averages. The resulting statement is as follows:

mysql> SELECT t1.date, t1.precip AS 'daily precip',

-> SUM(t2.precip) AS 'cum. precip',

-> DATEDIFF(MAX(t2.date),MIN(t2.date)) + 1 AS 'days elapsed',

-> SUM(t2.precip) / (DATEDIFF(MAX(t2.date),MIN(t2.date)) + 1)

-> AS 'avg. precip'

-> FROM rainfall AS t1 INNER JOIN rainfall AS t2

-> ON t1.date >= t2.date

-> GROUP BY t1.date;

+------------+--------------+-------------+--------------+-------------+

+------------+--------------+-------------+--------------+-------------+

| 2014-06-01 | 1.50 | 1.50 | 1 | 1.500000 |

| 2014-06-03 | 0.50 | 2.00 | 3 | 0.666667 |

| 2014-06-05 | 1.00 | 3.00 | 5 | 0.600000 |

+------------+--------------+-------------+--------------+-------------+

As this example illustrates, calculation of cumulative values from relative values requires

only a column that enables rows to be placed into the proper order. (For the rainfall

table, that's the date column.) Values in the column need not be sequential, or even

numeric. This differs from calculations that produce difference values from cumulative

values (see Recipe 15.9 ), which require a table that has a column containing an unbroken

sequence.

The running averages in the rainfall examples are based on dividing cumulative pre‐

cipitation sums by number of days elapsed as of each day. When the table has no gaps,

the number of days is the same as the number of values summed, making it easy to find

successive averages. When rows are missing, the calculations become more complex.

This demonstrates that it's necessary to consider the nature of your data and calculate

averages appropriately. The next example is conceptually similar to the previous ones

MySQL

Search WWH ::

Custom Search

Home