Databases Reference
In-Depth Information
Case Studies
Sometimes it's not about query optimization, schema optimization, index optimiza-
tion, or application design optimization—it's about all of these practices put together.
The case studies in this section illustrate how to approach some design challenges that
frequently cause problems for users. You might also be interested in Bill Karwin's book
SQL Antipatterns (Pragmatic Bookshelf). It has recipes for solving particular problems
with SQL that often trap the unwary programmer into a poor solution.
Building a Queue Table in MySQL
Building a queue in MySQL is tricky, and most designs we've seen don't work well
when the system experiences high traffic and lots of concurrency. The typical pattern
is to have a table that contains several types of rows: rows that haven't been processed,
rows in process, and finished rows. One or more worker processes look for unprocessed
rows, update them to “claim” them, and then perform the work and update them to
mark them as finished. Common examples include emails that are ready to send, orders
to process, comments to moderate, and so on.
There are two broad reasons why this doesn't work well. First, the table tends to grow
very large, and searching for the unprocessed rows becomes slow when the table is large
and the indexes are many levels deep. You can solve this by splitting the queue into
two tables and moving the completed rows to the archive or history table, which helps
keep the queue table small.
The second reason is that the process of finding work to do is usually implemented
with polling and locking. Polling creates load on the server, and locking creates con-
tention and serialization between worker processes. We'll see later, in Chapter 11 , why
that limits scalability.
Polling might actually be okay, but if it's not, you can use notifications to tell workers
that there's work to do. One technique is to use the SLEEP() function with a very long
timeout and an indicative comment, such as the following:
SELECT /* waiting on unsent_emails */ SLEEP(10000);
This will cause the thread to block until one of two things happens: it times out after
10,000 seconds, or another thread issues KILL QUERY and terminates it. So, after insert-
ing a batch of queries into the table, you can look at SHOW PROCESSLIST , find threads
that are running queries with the magical comment, and kill the queries. You can also
implement a form of notification with the GET_LOCK() and RELEASE_LOCK() functions,
or you can do it outside of the database, with a messaging service.
The final problem is how workers should claim rows so that they don't get processed
multiple times. We often see this implemented with SELECT FOR UPDATE . This is usually
a huge scalability bottleneck and causes a lot of pileups as transactions block on each
other and wait.
 
Search WWH ::




Custom Search