Databases Reference
In-Depth Information
Fixed allocation
There are two main ways to allocate data to shards: the fixed and dynamic allocation
strategies. Both require a partitioning function that takes a row's partitioning key as
input and returns the shard that holds the row. 9
Fixed allocation uses a partitioning function that depends only on the partitioning key's
value. Hash functions and modulus are good examples. These functions map each value
of the partitioning key into a limited number of “buckets” that can hold the data.
Suppose you want 100 buckets, and you want to find out where to put user 111. If
you're using a modulus, the answer is easy: 111 modulus 100 is 11, so you should place
the user into shard 11.
If, on the other hand, you're using the CRC32() function for hashing, the answer is 81:
mysql> SELECT CRC32(111) % 100;
+------------------+
| CRC32(111) % 100 |
+------------------+
| 81 |
+------------------+
The primary advantages of a fixed strategy are simplicity and low overhead. You can
also hardcode it into the application.
However, a fixed allocation strategy has disadvantages, too:
• If the shards are large and there are few of them, it can be hard to balance the load
across shards.
• Fixed allocation doesn't let you decide where to store each piece of data, which is
important for applications that don't have a very uniform load on the unit of
sharding. Some pieces of data will likely be much more active than others, and if
many of those happen to fall into the same shard, a fixed allocation strategy doesn't
let you ease the strain by moving some of them to another shard. (This is not as
much of a problem when you have many small pieces of data in each shard, because
the law of large numbers will help even things out.)
• It's usually harder to change the sharding, because it requires reallocating existing
data. For example, if you've sharded by a hash function modulus 10, you'll have
10 shards. If the application grows and the shards get too large, you might want
to increase the number of shards to 20. That will require rehashing everything,
updating a lot of data, and moving data between shards.
Because of these limitations, we usually prefer dynamic allocation for new applications.
But if you're sharding an existing application, you might find it easier to build a fixed
9. We're using “function” in its mathematical sense here to refer to a mapping from the input (domain) to
the output (range). As you'll see, you can create such a function in many ways, including using a lookup
table in your database.
 
Search WWH ::




Custom Search