Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

balance, though. You can load balance and yet still connect directly from the applica-

tion to the MySQL servers. In fact, centralized load-balancing systems usually work

well only when there's a pool of servers the application can treat as interchangeable. If

the application needs to make a decision such as whether it's safe to perform a read

from a replica server, it usually needs to connect directly to the server.

Besides making special-case logic possible, handling the load-balancing decisions in

the application can actually be very efficient. For example, if you have two identical

replicas, you can choose to use one of them for all queries that touch certain shards

and the other for queries that touch other shards. This makes good use of the replicas'

memory, because each of them caches only a portion of the data from its disks in

memory. And if one of the replicas fails, the other still has all the data required to serve

queries to both shards.

The following sections discuss some common ways to connect directly from the ap-

plication, and some of the things you should consider as you evaluate each option.

Splitting reads and writes in replication

MySQL replication gives you multiple copies of your data and lets you choose whether

to run a query on the master or a replica. The primary difficulty is how to handle stale

data on the replica, because replication is asynchronous. You should also treat replicas

as read-only, but the master can handle both read and write queries.

You usually have to modify your application so that it's aware of these concerns. The

application can then use the master for writes and split the reads between the master

and the replicas; it can use the replicas when possibly stale data doesn't matter and use

the master for data that has to be up-to-date. We call this read/write splitting .

If you use a master-master pair with an active and a passive master, the same consid-

erations hold. In this configuration, though, only the active server should receive writes.

Reads can go to the passive server if it's OK to read potentially stale data.

The biggest problem is how to avoid artifacts caused by reading stale data. The classic

artifact is when a user makes some change, such as adding a comment to a blog post,

then reloads the page but doesn't see the change because the application read stale data

from a replica.

Some of the most common methods of splitting reads and writes are as follows:

Query-based split

The simplest split is to direct all writes and any reads that can never tolerate stale

data to the active or master server. All other reads go to the replica or passive server.

This strategy is easy to implement, but in practice it won't use the replica as often

as it could, because very few read queries can always tolerate stale data.

Stale-data split

This is a minor enhancement of the query-based split strategy. Relatively little extra

work is required to make the application check the replica's lag and decide whether

Search WWH ::

Custom Search

Home