Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

Data sharding

Data sharding 7 is the most common and successful approach for scaling today's very

large MySQL applications. You shard the data by splitting it into smaller pieces, or

shards, and storing them on different nodes.

Sharding works well when combined with some type of functional partitioning. Most

sharded systems also have some “global” data that isn't sharded at all (say, lists of cities,

or login data). This global data is usually stored on a single node, often behind a cache

such as memcached .

In fact, most applications shard only the data that needs sharding—typically, the parts

of the dataset that will grow very large. Suppose you're building a blogging service. If

you expect 10 million users, you might not need to shard the user registration infor-

mation because you might be able to fit all of the users (or the active subset of them)

entirely in memory. If you expect 500 million users, on the other hand, you should

probably shard this data. The user-generated content, such as posts and comments,

will almost certainly require sharding in either case, because these records are much

larger and there are many more of them.

Large applications might have several logical datasets that you can shard differently.

You can store them on different sets of servers, but you don't have to. You can also

shard the same data multiple ways, depending on how you access it. We show an

example of this approach later.

Sharding is dramatically different from the way most applications are designed initially,

and it can be difficult to change an application from a monolithic data store to a sharded

architecture. That's why it's much easier to build an application with a sharded data

store from the start if you anticipate that it will eventually need one.

Most applications that don't build in sharding from the beginning go through stages

as they get larger. For example, you can use replication to scale read queries on your

blogging service until it doesn't work any more. Then you can split the service into

three parts: users, posts, and comments. You can place these on different servers (func-

tional partitioning), perhaps with a service-oriented architecture, and perform the joins

in the application. Figure 11-6 shows the evolution from a single server to functional

partitioning.

Finally, you can shard the posts and comments by the user ID, and keep the user in-

formation on a single node. If you keep a master-replica configuration for the global

node and use master-master pairs for the sharded nodes, the final data store might look

like Figure 11-7 .

7. Sharding is also called “splintering” and “partitioning,” but we use the term “sharding” to avoid

confusion. Google calls it sharding, and if it's good enough for Google, it's good enough for us.

Search WWH ::

Custom Search

Home