Database Reference
In-Depth Information
This distribution includes rows for scores 0 through 3, none of which appear in the
frequency distribution shown earlier.
The same principle applies to relative frequency distributions:
mysql> SET @n = (SELECT COUNT(score) FROM testscore);
mysql> SELECT ref.score, (COUNT(testscore.score)*100)/@n AS percent,
-> REPEAT('*',(COUNT(testscore.score)*100)/@n) AS 'percent histogram'
-> FROM ref LEFT JOIN testscore ON ref.score = testscore.score
-> GROUP BY ref.score;
+-------+---------+---------------------------+
| score | percent | percent histogram |
+-------+---------+---------------------------+
| 0 | 0.0000 | |
| 1 | 0.0000 | |
| 2 | 0.0000 | |
| 3 | 0.0000 | |
| 4 | 10.0000 | ********** |
| 5 | 5.0000 | ***** |
| 6 | 20.0000 | ******************** |
| 7 | 20.0000 | ******************** |
| 8 | 10.0000 | ********** |
| 9 | 25.0000 | ************************* |
| 10 | 10.0000 | ********** |
+-------+---------+---------------------------+
15.4. Counting Missing Values
Problem
A set of observations is incomplete. You want to find out how much so.
Solution
Count the number of NULL values in the set.
Discussion
Values can be missing from a set of observations for any number of reasons: a test may
not yet have been administered, something may have gone wrong during the test that
requires invalidating the observation, and so forth. You can represent such observations
in a dataset as NULL values to signify that they're missing or otherwise invalid, then use
summary statements to characterize the completeness of the dataset.
If a table t contains values to be summarized along a single dimension, a simple sum‐
mary suffices to characterize the missing values. Suppose that t looks like this:
Search WWH ::




Custom Search