Databases Reference
In-Depth Information
balance, though. You can load balance and yet still connect directly from the applica-
tion to the MySQL servers. In fact, centralized load-balancing systems usually work
well only when there's a pool of servers the application can treat as interchangeable. If
the application needs to make a decision such as whether it's safe to perform a read
from a replica server, it usually needs to connect directly to the server.
Besides making special-case logic possible, handling the load-balancing decisions in
the application can actually be very efficient. For example, if you have two identical
replicas, you can choose to use one of them for all queries that touch certain shards
and the other for queries that touch other shards. This makes good use of the replicas'
memory, because each of them caches only a portion of the data from its disks in
memory. And if one of the replicas fails, the other still has all the data required to serve
queries to both shards.
The following sections discuss some common ways to connect directly from the ap-
plication, and some of the things you should consider as you evaluate each option.
Splitting reads and writes in replication
MySQL replication gives you multiple copies of your data and lets you choose whether
to run a query on the master or a replica. The primary difficulty is how to handle stale
data on the replica, because replication is asynchronous. You should also treat replicas
as read-only, but the master can handle both read and write queries.
You usually have to modify your application so that it's aware of these concerns. The
application can then use the master for writes and split the reads between the master
and the replicas; it can use the replicas when possibly stale data doesn't matter and use
the master for data that has to be up-to-date. We call this read/write splitting .
If you use a master-master pair with an active and a passive master, the same consid-
erations hold. In this configuration, though, only the active server should receive writes.
Reads can go to the passive server if it's OK to read potentially stale data.
The biggest problem is how to avoid artifacts caused by reading stale data. The classic
artifact is when a user makes some change, such as adding a comment to a blog post,
then reloads the page but doesn't see the change because the application read stale data
from a replica.
Some of the most common methods of splitting reads and writes are as follows:
Query-based split
The simplest split is to direct all writes and any reads that can never tolerate stale
data to the active or master server. All other reads go to the replica or passive server.
This strategy is easy to implement, but in practice it won't use the replica as often
as it could, because very few read queries can always tolerate stale data.
Stale-data split
This is a minor enhancement of the query-based split strategy. Relatively little extra
work is required to make the application check the replica's lag and decide whether
 
Search WWH ::




Custom Search