Indexing for High Performance - High Performance MySQL

Databases Reference

In-Depth Information

Let's use the following query as an example:

SELECT * FROM payment WHERE staff_id = 2 AND customer_id = 584;

Should you create an index on (staff_id, customer_id) , or should you reverse the

column order? We can run some quick queries to help examine the distribution of

values in the table and determine which column has a higher selectivity. Let's transform

the query to count the cardinality of each predicate 5 in the WHERE clause:

mysql> SELECT SUM(staff_id = 2), SUM(customer_id = 584) FROM payment\G

*************************** 1. row ***************************

SUM(staff_id = 2): 7992

SUM(customer_id = 584): 30

According to the rule of thumb, we should place customer_id first in the index, because

the predicate matches fewer rows in the table. We can then run the query again to see

how selective staff_id is within the range of rows selected by this specific customer ID:

mysql> SELECT SUM(staff_id = 2) FROM payment WHERE customer_id = 584\G

*************************** 1. row ***************************

SUM(staff_id = 2): 17

Be careful with this technique, because the results depend on the specific constants

supplied for the chosen query. If you optimize your indexes for this query and other

queries don't fare as well, the server's performance might suffer overall, or some queries

might run unpredictably.

If you're using the “worst” sample query from a report from a tool such as pt-query-

digest , this technique can be an effective way to see what might be the most helpful

indexes for your queries and your data. But if you don't have specific samples to run,

it might be better to use the old rule of thumb, which is to look at the cardinality across

the board, not just for one query:

mysql> SELECT COUNT(DISTINCT staff_id)/COUNT(*) AS staff_id_selectivity,

> COUNT(DISTINCT customer_id)/COUNT(*) AS customer_id_selectivity,

> COUNT(*)

> FROM payment\G

*************************** 1. row ***************************

staff_id_selectivity: 0.0001

customer_id_selectivity: 0.0373

COUNT(*): 16049

customer_id has higher selectivity, so again the answer is to put that column first in the

index:

mysql> ALTER TABLE payment ADD KEY(customer_id, staff_id);

As with prefix indexes, problems often arise from special values that have higher than

normal cardinality. For example, we have seen applications treat users who aren't log-

ged in as “guest” users, who get a special user ID in session tables and other places

where user activity is recorded. Queries involving that user ID are likely to behave very

5. Optimizer geeks call this a “sarg,” for “searchable argument.” Now you're a geek, too!

Search WWH ::

Custom Search

Home