Database Reference
In-Depth Information
could have been written using string quoting; I used backticks there to avoid mixing
alias quoting styles within a given query.
How Repetitive Is a Set of Values?
To assess how much repetition is present in a set of values, use the ratio of COUNT(DIS
TINCT) and COUNT() . If all values are unique, both counts are the same and the ratio is
1. This is the case for the t values in the mail table and the pop values in the states
table:
mysql> SELECT COUNT(DISTINCT t) / COUNT(t) FROM mail;
+------------------------------+
| COUNT(DISTINCT t) / COUNT(t) |
+------------------------------+
| 1.0000 |
+------------------------------+
mysql> SELECT COUNT(DISTINCT pop) / COUNT(pop) FROM states;
+----------------------------------+
| COUNT(DISTINCT pop) / COUNT(pop) |
+----------------------------------+
| 1.0000 |
+----------------------------------+
For a more repetitive set of values, COUNT(DISTINCT) is less than COUNT() , and the ratio
is smaller:
mysql> SELECT COUNT(DISTINCT name) / COUNT(name) FROM driver_log;
+------------------------------------+
| COUNT(DISTINCT name) / COUNT(name) |
+------------------------------------+
| 0.3000 |
+------------------------------------+
What's the practical use for this ratio? A result close to zero indicates a high degree of
repetition, which means the values will group into a small number of categories natu‐
rally. A result of 1 or close to it indicates many unique values, with the consequence that
GROUP BY won't be very efficient for grouping the values into categories. (That is, there
will be a lot of categories, relative to the number of values.) This tells you that, to generate
a summary, you'll probably find it necessary to impose an artificial categorization on
the values, using the techniques described in this recipe.
8.11. Finding Smallest or Largest Summary Values
Problem
You want to compute per-group summary values but display only the smallest or largest
of them.
 
Search WWH ::




Custom Search