Databases Reference
In-Depth Information
allocation strategy instead of a dynamic one, because it's simpler. That said, most ap-
plications that use fixed allocation end up with a dynamic allocation strategy sooner
or later.
Dynamic allocation
The alternative to fixed allocation is a dynamic allocation strategy that you store sep-
arately, mapping each unit of data to a shard. An example is a two-column table of user
IDs and shard IDs:
CREATE TABLE user_to_shard (
user_id INT NOT NULL,
shard_id INT NOT NULL,
PRIMARY KEY (user_id)
);
The table itself is the partitioning function. Given a value for the partitioning key (the
user ID), you can find the shard ID. If the row doesn't exist, you can pick the desired
shard and add it to the table. You can also change it later—that's what makes this a
dynamic allocation strategy.
Dynamic allocation adds overhead to the partitioning function because it requires a
call to an external resource, such as a directory server (a data storage node that stores
the mapping). Such an architecture often needs more layers for efficiency. For example,
you can use a distributed caching system to store the directory server's data in memory,
because in practice it doesn't change all that much. Or—perhaps more commonly—
you can just add a shard_id column to the users table and store it there.
The biggest advantage of dynamic allocation is fine-grained control over where the data
is stored. This makes it easier to allocate data to the shards evenly and gives you a lot
of flexibility to accommodate changes you don't foresee.
A dynamic mapping also lets you build multiple levels of sharding strategies on top of
the simple key-to-shard mapping. For example, you can build a dual mapping that
assigns each unit of sharding to a group (e.g., a group of users in the book club), and
then keeps the groups together on a shard where possible. This lets you take advantage
of shard affinities, so you can avoid cross-shard queries.
If you use a dynamic allocation strategy, you can have imbalanced shards. This can be
useful when your servers aren't all equally powerful, or when you want to use some of
them for different purposes, such as archived data. If you also have the ability to reba-
lance shards at any time, you can maintain a one-to-one mapping of shards to nodes
without wasting capacity. Some people prefer the simplicity of one shard per node.
(But remember, there are advantages to keeping shards small.)
Dynamic allocation and smart use of shard affinities can prevent your cross-shard
queries from growing as you scale. Imagine a cross-shard query in a data store with
four nodes. In a fixed allocation, any given query might require touching all shards, but
a dynamic allocation strategy might let you run the same query on only three of the
 
Search WWH ::




Custom Search