Databases Reference
In-Depth Information
and SELECT parts of the query, you can fetch the results into your application and then
insert them back into the master. First, perform the following query on the master:
mysql> SELECT col1, sum(col2, ...) FROM main_db.enormous_table GROUP BY col1;
You can then insert the results back into the summary table by repeating the following
query for every row in the result set:
mysql> REPLACE INTO main_db.summary_table (col1, col2, ...) VALUES (?, ?, ...);
Again, you've spared the replicas from the large GROUP BY portion of the query; sepa-
rating the SELECT from the REPLACE means that the SELECT part of the query isn't replayed
on every replica.
This general strategy—saving the replicas from the expensive portion of a write—can
help in many cases where you have queries whose results are expensive to calculate but
cheap to handle once they've been calculated.
Do writes in parallel outside of replication
Another tactic for avoiding excessive lag on the replicas is to circumvent replication.
Any writes you do on the master must be serialized on the replica, so it makes sense to
think of “serialized writes” as a scarce resource. Do all your writes need to flow from
the master to the replica? How can you reserve your replica's limited serialized write
capacity for the writes that really need to be done via replication?
Thinking of it in this light might help you prioritize writes. In particular, if you can
identify some writes that are easy to do outside of replication, you can parallelize writes
that would otherwise claim precious write capacity on the replica.
One great example is data archiving, which we discussed earlier in this chapter. OLTP
archiving queries are often simple single-row operations. If you're just moving
unneeded rows from one table to another, there might be no reason these writes have
to be replicated to replicas. Instead, you can disable binary logging for the archiving
statements, and then run separate but identical archiving processes on the master and
replicas.
It might sound crazy to copy the data to another server yourself instead of letting rep-
lication do it, but it can actually make sense for some applications. This is especially
true if an application is the only source of updates to a certain set of tables. Replication
bottlenecks often center around a small set of tables, and if you can handle just those
tables outside of replication, you might be able to speed it up significantly.
Prime the cache for the replication thread
If you have the right kind of workload, you might benefit from parallelizing I/O on
replicas by prefetching data into memory. This technique is not well known, for good
reason. Most people should not use it, because it won't work unless you have the right
workload characteristics and hardware configuration. The other types of changes we've
 
Search WWH ::




Custom Search