Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

Fixed allocation

There are two main ways to allocate data to shards: the fixed and dynamic allocation

strategies. Both require a partitioning function that takes a row's partitioning key as

input and returns the shard that holds the row. 9

Fixed allocation uses a partitioning function that depends only on the partitioning key's

value. Hash functions and modulus are good examples. These functions map each value

of the partitioning key into a limited number of “buckets” that can hold the data.

Suppose you want 100 buckets, and you want to find out where to put user 111. If

you're using a modulus, the answer is easy: 111 modulus 100 is 11, so you should place

the user into shard 11.

If, on the other hand, you're using the CRC32() function for hashing, the answer is 81:

mysql> SELECT CRC32(111) % 100;

+------------------+

| CRC32(111) % 100 |

+------------------+

| 81 |

+------------------+

The primary advantages of a fixed strategy are simplicity and low overhead. You can

also hardcode it into the application.

However, a fixed allocation strategy has disadvantages, too:

• If the shards are large and there are few of them, it can be hard to balance the load

across shards.

• Fixed allocation doesn't let you decide where to store each piece of data, which is

important for applications that don't have a very uniform load on the unit of

sharding. Some pieces of data will likely be much more active than others, and if

many of those happen to fall into the same shard, a fixed allocation strategy doesn't

let you ease the strain by moving some of them to another shard. (This is not as

much of a problem when you have many small pieces of data in each shard, because

the law of large numbers will help even things out.)

• It's usually harder to change the sharding, because it requires reallocating existing

data. For example, if you've sharded by a hash function modulus 10, you'll have

10 shards. If the application grows and the shards get too large, you might want

to increase the number of shards to 20. That will require rehashing everything,

updating a lot of data, and moving data between shards.

Because of these limitations, we usually prefer dynamic allocation for new applications.

But if you're sharding an existing application, you might find it easier to build a fixed

9. We're using “function” in its mathematical sense here to refer to a mapping from the input (domain) to

the output (range). As you'll see, you can create such a function in many ways, including using a lookup

table in your database.

Search WWH ::

Custom Search

Home