Databases Reference
In-Depth Information
• A very long transaction can cause the reported lag to fluctuate. For example, if you
have a transaction that updates data, stays open for an hour, and then commits,
the update will go into the binary log an hour after it actually happened. When the
replica processes the statement, it will temporarily report that it is an hour behind
the master, and then it will jump back to zero seconds behind.
• If a distribution master is falling behind and has replicas of its own that are caught
up with it, the replicas will report that they are zero seconds behind, even if there
is lag relative to the ultimate master.
The solution to these problems is to ignore Seconds_behind_master and monitor replica
lag with something you can observe and measure directly. The best solution is a heart-
beat record , which is a timestamp that you update once per second on the master. To
calculate the lag, you can simply subtract the heartbeat from the current timestamp on
the replica. This method is immune to all the problems we just mentioned, and it has
the added benefit of creating a handy timestamp that shows to what point in time the
replica's data is current. The pt-heartbeat script, included in Percona Toolkit, is the
most popular implementation of a replication heartbeat.
A heartbeat has other benefits, too. The replication heartbeat records in the binary log
are useful for many purposes, such as disaster recovery in otherwise hard-to-solve
scenarios.
None of the lag metrics we just mentioned gives a sense of how long it will take for a
replica to actually catch up to the master. This depends upon many factors, such as
how powerful the replica is and how many write queries the master continues to pro-
cess. See the section “When Will Replicas Begin to Lag?” on page 484 for more on that
topic.
Determining Whether Replicas Are Consistent with the Master
In a perfect world, a replica would always be an exact copy of its master. But in the real
world, errors in replication can cause the replica's data to “drift” out of sync with the
master's. Even if there are apparently no errors, replicas can still get out of sync because
of MySQL features that don't replicate correctly, bugs in MySQL, network corruption,
crashes, ungraceful shutdowns, or other failures. 16
Our experience is that this is the rule, not the exception, which means checking your
replicas for consistency with their masters should probably be a routine task. This is
especially important if you use replication for backups, because you don't want to take
backups from a corrupted replica.
MySQL has no built-in method of determining whether one server has the same data
as another server. It does provide some building blocks for checksumming tables and
16. If you're using a nontransactional storage engine, shutting down the server without first running STOP
SLAVE is ungraceful.
 
Search WWH ::




Custom Search