Databases Reference
In-Depth Information
just been discussing are usually far better options, and there are lots more ways to apply
them than you might think. However, we know of a small handful of large applications
that benefit from prefetching data from disk.
There are two workable implementations for this. One idea is to use a program that
reads slightly ahead of the replica's SQL thread in the relay logs and executes the queries
as SELECT statements. This causes the server to fetch some of the data from the disk into
memory, so when the replica's SQL thread executes the statement from the relay log,
it doesn't need to wait for data to be fetched from disk. In effect, the SELECT parallelizes
I/O that the replica SQL thread must normally do serially. While one statement is
changing data, the next statement's data is being fetched from disk into memory.
The following conditions might indicate that prefetching will work:
• The replication SQL thread is I/O-bound, but the replica server isn't I/O-bound
overall. A completely I/O-bound server won't benefit from prefetching, because it
won't have any idle hard drives to do the work.
• The replica has a lot of disk drives—perhaps eight or more drives per replica.
• You use the InnoDB storage engine, and the working set is much too large to fit in
memory.
An example workload that benefits from prefetching is one with a lot of widely scattered
single-row UPDATE statements, which are typically high-concurrency on the master.
DELETE statements might also benefit from this approach, but INSERT statements are less
likely to—especially when rows are inserted sequentially—because the end of the index
will already be “hot” from previous inserts.
If a table has many indexes, it might not be possible to prefetch all the data the statement
will modify. The UPDATE statement might modify every index, but the SELECT will typi-
cally read only the primary key and one secondary index, in the best case. The UPDATE
will still need to fetch other indexes for modification. That decreases how effective this
tactic can be on tables with many indexes.
This technique is not a silver bullet. There are many reasons why it might not work for
you or might even cause more problems. You should attempt it only if you know your
hardware and operating system well. We know some people for whom this approach
increased replication speed by 300% to 400%, but we've tried it ourselves many times
and found it usually doesn't work. Getting the parameters right is important, but there
isn't always a right combination of parameters.
The mk-slave-prefetch tool, which is part of Maatkit, is one implementation of the ideas
we've described in this section. It has a lot of sophisticated features to try to work in
as many cases as possible, but the drawback is that it has a lot of complexity and requires
a lot of expertise to use. Another is Anders Karlsson's slavereadahead tool, available
from http://sourceforge.net/projects/slavereadahead/ .
 
Search WWH ::




Custom Search