Databases Reference
In-Depth Information
that affect availability, take a balanced view of the risks, and work on the biggest ones
first. Some people work hard to build software that can handle any kind of hardware
failure, but bugs in this kind of software can cause more downtime than it saves. Some
people build “unsinkable” systems with all kinds of redundancy, but they forget that
the data center can lose power or connectivity. Or maybe they completely forget about
the possibility of malicious attackers or programmer mistakes that delete or corrupt
data—a careless DROP TABLE can cause downtime, too.
Adding redundancy to your system can take two forms: adding spare capacity and
duplicating components. It's actually quite easy to add spare capacity—you can use
any of the techniques we mention throughout this chapter or the previous one. One
way to increase availability is to create a cluster or pool of servers and add a load-
balancing solution. If one server fails, the other servers take over its load. Some people
underutilize components intentionally, because it leaves much more “headroom” to
handle performance problems caused either by increased load or by component
failures.
For many purposes, you will need to duplicate components and have a standby ready
to take over if the main component fails. A duplicated component can be as simple as
a spare network card, router, or hard drive—whatever you think is most likely to fail.
Duplicating entire MySQL servers is a little harder, because a server is useless without
its data. That means you must ensure that your standby servers have access to the
primary server's data. Shared or replicated storage is one popular way to accomplish
this. But is it really a high-availability architecture? Let's dig in and see.
Shared Storage or Replicated Disk
Shared storage is a way to decouple your database server and its storage, usually with
a SAN. With shared storage, the server mounts the filesystem and operates normally.
If the server dies, a standby server can mount the same filesystem, perform any necessary
recovery operations, and start MySQL on the failed server's data. This process is logi-
cally no different from fixing the failed server, except that it's faster because the standby
server is already booted and ready to go. Filesystem checks, InnoDB recovery, and
warmup 5 are the biggest delays you're likely to encounter once failover is initiated, but
failure detection itself can take quite a long time in many setups, too.
Shared storage has two advantages: it helps avoid data loss from the failure of any
component other than the storage, and it makes it possible to build redundancy in the
non-storage components. As a result, it's a strategy for helping to reduce availability
requirements in some parts of the system, making it easier to achieve high availability
by concentrating your efforts on a smaller set of components. But the shared storage
5. Percona Server offers a feature to restore the buffer pool to its saved state after a restart, and this works
fine with shared storage. This can reduce warmup time by hours or days. MySQL 5.6 will have a similar
feature.
 
Search WWH ::




Custom Search