Database Reference
In-Depth Information
Splitting and migrating
At the heart of the sharding mechanism are the splitting and migration of chunks.
First, consider the idea of splitting chunks. When you initially set up a sharded
cluster, just one chunk exists. That one chunk's range encompasses the entire sharded
collection. How then do you arrive at a sharded cluster with multiple chunks? The
answer is that chunks are split once they reach a certain size threshold. The default
max chunk size is 64 MB or 100,000 documents, whichever comes first. As data is
added to a new sharded cluster, the original chunk eventually reaches one of these
thresholds, triggering a chunk split. Splitting a chunk is a simple operation; it basically
involves dividing the original range into two ranges so that two chunks, each repre-
senting the same number of documents, are created.
Note that chunk splitting is logical . When MongoDB splits a chunk, it merely modi-
fies the chunk metadata so that one chunk becomes two. Splitting a chunk, therefore,
does not affect the physical ordering of the documents in a sharded collection. This
means that splitting is simple and fast.
But as you'll recall, one of the biggest difficulties in designing a sharding system is
ensuring that data is always evenly balanced. MongoDB's sharded clusters balance by
moving chunks between shards. We call this migrating , and unlike splitting, it's a real,
physical operation.
Migrations are managed by a software process known as the balancer . The bal-
ancer's job is to ensure that data remains evenly distributed across shards. It accom-
plishes this by keeping track of the number of chunks on each shard. Though the
heuristic varies somewhat depending on total data size, the balancer will usually
initiate a balancing round when the difference between the shard with the greatest
number of chunks and the shard with the least number of chunks is greater than
eight. During the balancing round, chunks are migrated from the shard with the
greater number of chunks to the shard with fewer chunks until the two shards are
roughly even.
Don't worry too much if this doesn't make sense yet. The next section, where I'll
demonstrate sharding with a sample cluster, will bring the concepts of the shard key
and chunking into the light of practice.
9.2
A sample shard cluster
The best way to get a handle on sharding is to see how it works in action. Fortunately,
it's possible to set up a sharded cluster on a single machine, and that's exactly what
we'll do now. 4 We'll then simulate the behavior of the sample cloud-based spreadsheet
application described in the previous section. Along the way, we'll examine the global
shard configuration and see first-hand how data is partitioned based on the shard key.
4
The idea is that you can run every mongod and mongos process on a single machine for testing. Later in the
chapter, we'll look at production sharding configurations and the minimum number of machines required
for a viable deployment.
Search WWH ::




Custom Search