High Availability - High Performance MySQL

Databases Reference

In-Depth Information

redundancy, you can stop using the failed piece and start using its redundant standby

instead. The combination of redundancy and failover can enable you to recover more

quickly, and as you know, reducing MTTR reduces downtime and improves

availability.

Before we continue, we should talk about a few terms. We use “failover” consistently;

some people use “fallback” as a synonym. Sometimes people also say “switchover” to

denote a switch that's planned instead of a response to a failure. Po-tay-toe,

po-tah-toe. We also use the term “failback” to indicate the reverse of failover. If you

have failback capability, failover can be a two-way process: when server A fails and

server B replaces it, you can repair server A and fail back to it.

Failover is good for more than just recovery from failures. You can also do planned

failovers to reduce downtime (improve availability) for events such as upgrades, schema

changes, application modifications, or scheduled maintenance.

You need to identify how fast failover needs to be, but you also need to know how

quickly you have to replace the failed component after a failover. Until you restore the

system's depleted standby capacity, you have less redundancy and you're exposed to

extra risk. Thus, having a standby doesn't eliminate the need for timely replacement

of failed components. How quickly can you build a new standby server, install its op-

erating system, and give it a fresh copy of your data? Do you have enough standby

machines? You might need more than one.

Failover comes in many flavors. We've already discussed several of them, because load

balancing and failover are similar in many ways, and the line between them is a bit

fuzzy. In general, we think a full failover solution, at a minimum, needs to be able to

monitor and automatically replace a component. This should ideally be transparent to

the application. Load balancing need not provide this capability.

In the Unix world, failover is often accomplished with the tools provided by the High

Availability Linux project ( http://linux-ha.org ) , which run on many Unix-like operating

systems, not just Linux. The Linux-HA stack has become significantly more featureful

in the last few years. Today most people think of Pacemaker as the main component

in the stack. Pacemaker replaces the older heartbeat tool. Various other tools accom-

plish IP takeover and load-balancing functionality. You can combine them with DRBD

and/or LVS.

The most important part of failover is failback. If you can't switch back and forth

between servers at will, failover is a dead end and only postpones downtime. This is

why we like symmetrical replication topologies, such as the dual-master configuration,

and we dislike ring replication with three or more co-masters. If the configuration is

symmetrical, failover and failback are the same operation in opposite directions. (It's

worth mentioning that DRBD has built-in failback capabilities.)

In some applications, it's critical that failover and failback be as fast and atomic as

possible. Even when it's not critical, it's still a good idea not to rely on things that are

Search WWH ::

Custom Search

Home