Databases Reference
In-Depth Information
| 48 | Yaound |
| 47 | Tel Aviv-Jaffa |
| 47 | Shimoga |
| 45 | Cabuyao |
| 45 | Callao |
| 45 | Bislig |
+-----+----------------+
Notice that there are roughly 45 to 65 occurrences of each value. Now we find the most
frequently occurring city name prefixes , beginning with three-letter prefixes:
mysql> SELECT COUNT(*) AS cnt, LEFT(city, 3) AS pref
-> FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10;
+-----+------+
| cnt | pref |
+-----+------+
| 483 | San |
| 195 | Cha |
| 177 | Tan |
| 167 | Sou |
| 163 | al- |
| 163 | Sal |
| 146 | Shi |
| 136 | Hal |
| 130 | Val |
| 129 | Bat |
+-----+------+
There are many more occurrences of each prefix, so there are many fewer unique pre-
fixes than unique full-length city names. The idea is to increase the prefix length until
the prefix becomes nearly as selective as the full length of the column. A little experi-
mentation shows that 7 is a good value:
mysql> SELECT COUNT(*) AS cnt, LEFT(city, 7) AS pref
-> FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10;
+-----+---------+
| cnt | pref |
+-----+---------+
| 70 | Santiag |
| 68 | San Fel |
| 65 | London |
| 61 | Valle d |
| 49 | Hiroshi |
| 48 | Teboksa |
| 48 | Pak Kre |
| 48 | Yaound |
| 47 | Tel Avi |
| 47 | Shimoga |
+-----+---------+
Another way to calculate a good prefix length is by computing the full column's selec-
tivity and trying to make the prefix's selectivity close to that value. Here's how to find
the full column's selectivity:
 
Search WWH ::




Custom Search