Query Performance Optimization - High Performance MySQL

Databases Reference

In-Depth Information

Most uses of SELECT FOR UPDATE can be rewritten to use a similar technique.

The final task is to clean up rows that were claimed but never processed because the

worker quit for some reason, but that's easy. You can just run an UPDATE to reset them

periodically. Execute SHOW PROCESSLIST , gather a list of all the thread IDs that are cur-

rently connected to the server, and use that in the WHERE clause to avoid stealing a row

that's actually being processed. Assuming the list of thread IDs is ( 10, 20, 30 ), here's

a sample query that “times out” and reclaims rows after 10 minutes:

UPDATE unsent_emails

SET owner = 0, status = 'unsent'

WHERE owner NOT IN(0, 10, 20, 30) AND status = 'claimed'

AND ts < CURRENT_TIMESTAMP - INTERVAL 10 MINUTE;

By the way, notice how the index is carefully designed for the queries we're running.

This is an example of the interplay between this chapter and the previous one. The

query we just showed will be able to use the full width of the index, because the range

condition is placed on the last column in the index. The index will also be useful for

the other queries; this avoids the need for another redundant index for the two columns

used by the other queries.

We've illustrated a few fundamentals in this case study:

• Stop doing things, or do them less often. Don't use polling unless you have to,

because it adds load and unproductive busywork.

• Do things more quickly. Use an UPDATE instead of a SELECT FOR UPDATE followed by

an UPDATE , because the faster the transaction commits, the shorter the lock duration

is, and the less contention and serialization there are. Also, keep the unprocessed

data separate from the processed rows, because smaller is faster.

• The overall moral of this example is that some queries can't be optimized; they

must be replaced with a different query or a different strategy altogether. SELECT

FOR UPDATE queries usually fall into that category.

Sometimes, the best solution is to move the queue outside of the database server en-

tirely. Redis is good at queue operations, and occasionally you can use memcached

for this purpose, too. Alternatively, you might evaluate the Q4M storage engine for

MySQL, although we have no experience using it in production environments so we

can't provide any guidance here. RabbitMQ and Gearman 21 can be very helpful for

some purposes, too.

Computing the Distance Between Points

Geospatial computations crop up now and again in our work. People don't tend to use

MySQL for heavy spatial computation—PostgreSQL is usually a much better choice

21. See http://www.rabbitmq.com and http://gearman.org .

Search WWH ::

Custom Search

Home