Databases Reference
In-Depth Information
Most uses of SELECT FOR UPDATE can be rewritten to use a similar technique.
The final task is to clean up rows that were claimed but never processed because the
worker quit for some reason, but that's easy. You can just run an UPDATE to reset them
periodically. Execute SHOW PROCESSLIST , gather a list of all the thread IDs that are cur-
rently connected to the server, and use that in the WHERE clause to avoid stealing a row
that's actually being processed. Assuming the list of thread IDs is ( 10, 20, 30 ), here's
a sample query that “times out” and reclaims rows after 10 minutes:
UPDATE unsent_emails
SET owner = 0, status = 'unsent'
WHERE owner NOT IN(0, 10, 20, 30) AND status = 'claimed'
AND ts < CURRENT_TIMESTAMP - INTERVAL 10 MINUTE;
By the way, notice how the index is carefully designed for the queries we're running.
This is an example of the interplay between this chapter and the previous one. The
query we just showed will be able to use the full width of the index, because the range
condition is placed on the last column in the index. The index will also be useful for
the other queries; this avoids the need for another redundant index for the two columns
used by the other queries.
We've illustrated a few fundamentals in this case study:
• Stop doing things, or do them less often. Don't use polling unless you have to,
because it adds load and unproductive busywork.
• Do things more quickly. Use an UPDATE instead of a SELECT FOR UPDATE followed by
an UPDATE , because the faster the transaction commits, the shorter the lock duration
is, and the less contention and serialization there are. Also, keep the unprocessed
data separate from the processed rows, because smaller is faster.
• The overall moral of this example is that some queries can't be optimized; they
must be replaced with a different query or a different strategy altogether. SELECT
FOR UPDATE queries usually fall into that category.
Sometimes, the best solution is to move the queue outside of the database server en-
tirely. Redis is good at queue operations, and occasionally you can use memcached
for this purpose, too. Alternatively, you might evaluate the Q4M storage engine for
MySQL, although we have no experience using it in production environments so we
can't provide any guidance here. RabbitMQ and Gearman 21 can be very helpful for
some purposes, too.
Computing the Distance Between Points
Geospatial computations crop up now and again in our work. People don't tend to use
MySQL for heavy spatial computation—PostgreSQL is usually a much better choice
21. See http://www.rabbitmq.com and http://gearman.org .
 
Search WWH ::




Custom Search