Statistical Techniques - MySQL

Database Reference

In-Depth Information

This distribution includes rows for scores 0 through 3, none of which appear in the

frequency distribution shown earlier.

The same principle applies to relative frequency distributions:

mysql> SET @n = (SELECT COUNT(score) FROM testscore);

mysql> SELECT ref.score, (COUNT(testscore.score)*100)/@n AS percent,

-> REPEAT('*',(COUNT(testscore.score)*100)/@n) AS 'percent histogram'

-> FROM ref LEFT JOIN testscore ON ref.score = testscore.score

-> GROUP BY ref.score;

+-------+---------+---------------------------+

| score | percent | percent histogram |

+-------+---------+---------------------------+

| 0 | 0.0000 | |

| 1 | 0.0000 | |

| 2 | 0.0000 | |

| 3 | 0.0000 | |

| 4 | 10.0000 | ********** |

| 5 | 5.0000 | ***** |

| 6 | 20.0000 | ******************** |

| 7 | 20.0000 | ******************** |

| 8 | 10.0000 | ********** |

| 9 | 25.0000 | ************************* |

| 10 | 10.0000 | ********** |

+-------+---------+---------------------------+

15.4. Counting Missing Values

Problem

A set of observations is incomplete. You want to find out how much so.

Solution

Count the number of NULL values in the set.

Discussion

Values can be missing from a set of observations for any number of reasons: a test may

not yet have been administered, something may have gone wrong during the test that

requires invalidating the observation, and so forth. You can represent such observations

in a dataset as NULL values to signify that they're missing or otherwise invalid, then use

summary statements to characterize the completeness of the dataset.

If a table t contains values to be summarized along a single dimension, a simple sum‐

mary suffices to characterize the missing values. Suppose that t looks like this:

MySQL

Search WWH ::

Custom Search

Home