Databases Reference
In-Depth Information
Now suppose you want to create a comment for this user and store it in the same shard.
The application can assign the comment ID 5 for the user, and combine the value 5
with the shard ID 11 in the same way.
The benefit of this approach is that each object's ID carries its partitioning key along
with it, whereas other approaches usually require a join or another lookup to find the
partitioning key. If you want to retrieve a certain comment from the database, you don't
need to know which user owns it; the object's ID tells you where to find it. If the object
were sharded dynamically by user ID, you'd have to find the comment's user, then ask
the directory server which shard to look on.
Another solution is to store the partitioning key together with the object in separate
columns. For example, you'd never refer just to comment 5, but to comment 5 be-
longing to user 3. This approach will probably make some people happier, because it
doesn't violate first normal form; however, the extra column causes more overhead,
coding, and inconvenience. (This is one case where we feel there's an advantage to
storing two values in a single column.)
The drawback of explicit allocation is that the sharding is fixed, and it's harder to
balance shards. On the other hand, this approach works well with the combination of
fixed and dynamic allocation. Instead of hashing to a fixed number of buckets and
mapping these to nodes, you encode the bucket as part of each object. This gives the
application control over where the data is located, so it can place related data together
on the same shard.
BoardReader ( http://boardreader.com ) uses a variation of this technique: it encodes the
partitioning key in the Sphinx document ID. This makes it easy to find each search
result's related data in the sharded data store. See Appendix F for more on Sphinx.
We've described mixed allocation because we've seen cases where it's useful, but nor-
mally we don't recommend it. We like to use dynamic allocation when possible, and
avoid explicit allocation.
Rebalancing shards
If necessary, you can move data to different shards to rebalance the load. For example,
many readers have probably heard developers from large photo-sharing sites or popular
social networking sites mention their tools for moving users to different shards.
The ability to move data between shards has its benefits. For example, it can help you
upgrade your hardware by enabling you to move users off the old shard onto the new
one without taking the whole shard down or making it read-only.
However, we like to avoid rebalancing shards if possible, because it can disrupt service
to your users. Moving data between shards also makes it harder to add features to the
application, because new features might have to include an upgrade to the rebalance
script. If you keep your shards small enough, you might not need to do this; you can
 
Search WWH ::




Custom Search