Databases Reference
In-Depth Information
Let's use the following query as an example:
SELECT * FROM payment WHERE staff_id = 2 AND customer_id = 584;
Should you create an index on (staff_id, customer_id) , or should you reverse the
column order? We can run some quick queries to help examine the distribution of
values in the table and determine which column has a higher selectivity. Let's transform
the query to count the cardinality of each predicate 5 in the WHERE clause:
mysql> SELECT SUM(staff_id = 2), SUM(customer_id = 584) FROM payment\G
*************************** 1. row ***************************
SUM(staff_id = 2): 7992
SUM(customer_id = 584): 30
According to the rule of thumb, we should place customer_id first in the index, because
the predicate matches fewer rows in the table. We can then run the query again to see
how selective staff_id is within the range of rows selected by this specific customer ID:
mysql> SELECT SUM(staff_id = 2) FROM payment WHERE customer_id = 584\G
*************************** 1. row ***************************
SUM(staff_id = 2): 17
Be careful with this technique, because the results depend on the specific constants
supplied for the chosen query. If you optimize your indexes for this query and other
queries don't fare as well, the server's performance might suffer overall, or some queries
might run unpredictably.
If you're using the “worst” sample query from a report from a tool such as pt-query-
digest , this technique can be an effective way to see what might be the most helpful
indexes for your queries and your data. But if you don't have specific samples to run,
it might be better to use the old rule of thumb, which is to look at the cardinality across
the board, not just for one query:
mysql> SELECT COUNT(DISTINCT staff_id)/COUNT(*) AS staff_id_selectivity,
> COUNT(DISTINCT customer_id)/COUNT(*) AS customer_id_selectivity,
> COUNT(*)
> FROM payment\G
*************************** 1. row ***************************
staff_id_selectivity: 0.0001
customer_id_selectivity: 0.0373
COUNT(*): 16049
customer_id has higher selectivity, so again the answer is to put that column first in the
index:
mysql> ALTER TABLE payment ADD KEY(customer_id, staff_id);
As with prefix indexes, problems often arise from special values that have higher than
normal cardinality. For example, we have seen applications treat users who aren't log-
ged in as “guest” users, who get a special user ID in session tables and other places
where user activity is recorded. Queries involving that user ID are likely to behave very
5. Optimizer geeks call this a “sarg,” for “searchable argument.” Now you're a geek, too!
 
Search WWH ::




Custom Search