Replication - High Performance MySQL

Databases Reference

In-Depth Information

and SELECT parts of the query, you can fetch the results into your application and then

insert them back into the master. First, perform the following query on the master:

mysql> SELECT col1, sum(col2, ...) FROM main_db.enormous_table GROUP BY col1;

You can then insert the results back into the summary table by repeating the following

query for every row in the result set:

mysql> REPLACE INTO main_db.summary_table (col1, col2, ...) VALUES (?, ?, ...);

Again, you've spared the replicas from the large GROUP BY portion of the query; sepa-

rating the SELECT from the REPLACE means that the SELECT part of the query isn't replayed

on every replica.

This general strategy—saving the replicas from the expensive portion of a write—can

help in many cases where you have queries whose results are expensive to calculate but

cheap to handle once they've been calculated.

Do writes in parallel outside of replication

Another tactic for avoiding excessive lag on the replicas is to circumvent replication.

Any writes you do on the master must be serialized on the replica, so it makes sense to

think of “serialized writes” as a scarce resource. Do all your writes need to flow from

the master to the replica? How can you reserve your replica's limited serialized write

capacity for the writes that really need to be done via replication?

Thinking of it in this light might help you prioritize writes. In particular, if you can

identify some writes that are easy to do outside of replication, you can parallelize writes

that would otherwise claim precious write capacity on the replica.

One great example is data archiving, which we discussed earlier in this chapter. OLTP

archiving queries are often simple single-row operations. If you're just moving

unneeded rows from one table to another, there might be no reason these writes have

to be replicated to replicas. Instead, you can disable binary logging for the archiving

statements, and then run separate but identical archiving processes on the master and

replicas.

It might sound crazy to copy the data to another server yourself instead of letting rep-

lication do it, but it can actually make sense for some applications. This is especially

true if an application is the only source of updates to a certain set of tables. Replication

bottlenecks often center around a small set of tables, and if you can handle just those

tables outside of replication, you might be able to speed it up significantly.

Prime the cache for the replication thread

If you have the right kind of workload, you might benefit from parallelizing I/O on

replicas by prefetching data into memory. This technique is not well known, for good

reason. Most people should not use it, because it won't work unless you have the right

workload characteristics and hardware configuration. The other types of changes we've

Search WWH ::

Custom Search

Home