Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

Now suppose you want to create a comment for this user and store it in the same shard.

The application can assign the comment ID 5 for the user, and combine the value 5

with the shard ID 11 in the same way.

The benefit of this approach is that each object's ID carries its partitioning key along

with it, whereas other approaches usually require a join or another lookup to find the

partitioning key. If you want to retrieve a certain comment from the database, you don't

need to know which user owns it; the object's ID tells you where to find it. If the object

were sharded dynamically by user ID, you'd have to find the comment's user, then ask

the directory server which shard to look on.

Another solution is to store the partitioning key together with the object in separate

columns. For example, you'd never refer just to comment 5, but to comment 5 be-

longing to user 3. This approach will probably make some people happier, because it

doesn't violate first normal form; however, the extra column causes more overhead,

coding, and inconvenience. (This is one case where we feel there's an advantage to

storing two values in a single column.)

The drawback of explicit allocation is that the sharding is fixed, and it's harder to

balance shards. On the other hand, this approach works well with the combination of

fixed and dynamic allocation. Instead of hashing to a fixed number of buckets and

mapping these to nodes, you encode the bucket as part of each object. This gives the

application control over where the data is located, so it can place related data together

on the same shard.

BoardReader ( http://boardreader.com ) uses a variation of this technique: it encodes the

partitioning key in the Sphinx document ID. This makes it easy to find each search

result's related data in the sharded data store. See Appendix F for more on Sphinx.

We've described mixed allocation because we've seen cases where it's useful, but nor-

mally we don't recommend it. We like to use dynamic allocation when possible, and

avoid explicit allocation.

Rebalancing shards

If necessary, you can move data to different shards to rebalance the load. For example,

many readers have probably heard developers from large photo-sharing sites or popular

social networking sites mention their tools for moving users to different shards.

The ability to move data between shards has its benefits. For example, it can help you

upgrade your hardware by enabling you to move users off the old shard onto the new

one without taking the whole shard down or making it read-only.

However, we like to avoid rebalancing shards if possible, because it can disrupt service

to your users. Moving data between shards also makes it harder to add features to the

application, because new features might have to include an upgrade to the rebalance

script. If you keep your shards small enough, you might not need to do this; you can

Search WWH ::

Custom Search

Home