Databases Reference
In-Depth Information
When Will Replicas Begin to Lag?
A common question about replicas is how to predict when they won't be able to keep
up with the changes coming from the master. It can be hard to tell the difference be-
tween a replica that's at 5% of its capacity and one that's at 95%. However, it's possible
to get at least a little advance warning of impending saturation and estimate replication
capacity.
The first thing you should do is watch for spikes of lag. If you have graphs of replication
lag, you should notice little bumps in the graphs as the replica begins to encounter short
periods where there's more work and it can't keep up. As the workload gets closer to
consuming the replica's capacity, you'll see these bumps get higher and wider. The
front side of the bump will generally have a consistent angle, but the back side, when
the replica is catching up after lagging behind, will become a gentler and gentler slope.
The presence of these bumps, and growth in them, is a warning sign that you're ap-
proaching your limits.
To predict what's going to happen at some point in the future, deliberately delay a
replica, and then see how fast it can catch up. The goal is to explicitly see how steep
the back side of that slope is. If you stop a replica for an hour, then start it and it catches
up in one hour, it is running at half of its capacity. That is, if you stop it at noon and
restart it at 1:00, and it's caught up again at 2:00, it has applied all of the changes from
12:00 to 2:00 in an hour, so it went at double speed.
Finally, in Percona Server and MariaDB you can measure the replication utilization
directly. Enable the userstat server variable, and then you'll be able to do the following:
mysql> SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS
-> WHERE USER='#mysql_system#'\G
*************************** 1. row ***************************
USER: #mysql_system#
TOTAL_CONNECTIONS: 1
CONCURRENT_CONNECTIONS: 2
CONNECTED_TIME: 46188
BUSY_TIME: 719
ROWS_FETCHED: 0
ROWS_UPDATED: 1882292
SELECT_COMMANDS: 0
UPDATE_COMMANDS: 580431
OTHER_COMMANDS: 338857
COMMIT_TRANSACTIONS: 1016571
ROLLBACK_TRANSACTIONS: 0
You can compare the BUSY_TIME to one-half of the CONNECTED_TIME (because there are
two replication threads on the replica) to see how much of the time the replication
thread was actively processing statements. 15 In our example, the replica is using around
15. If the replication threads are always running, you can just use the server's uptime instead of half the
CONNECTED_TIME .
 
Search WWH ::




Custom Search