Database Reference
In-Depth Information
Early chunk splitting
A sharded cluster will split chunks aggressively early on to expedite the migration of
data across shards. Specifically, when the number of chunks is less than 10, chunks
will split at one quarter of the max chunk size (16 MB), and when the number of
chunks is between 10 and 20, they'll split at half the maximum chunk size (32 MB).
This has two nice benefits. First, it creates a lot of chunks up front, which initiates a
migration round. Second, that migration round occurs fairly painlessly, as the small
chunk size ensures that the total amount of migrated data is small.
Now the split threshold will increase. You can see how the splitting slows down, and
how chunks start to grow toward their max size, by doing a more massive insert. Try
adding another 800 MB to the cluster:
$ ruby load.rb 800
This will take a lot time to run, so you may want to step away and grab a snack after
starting this load process. By the time it's done, you'll have increased the total data
size by a factor of eight. But if you check the chunking status, you'll see that there are
only around twice as many chunks:
> use config
> db.chunks.count()
21
Given that there are more chunks, the average chunk ranges will be smaller, but each
chunk will include more data. So for example, the first chunk in the collection spans
from Abbott to Bender but it's already nearly 60 MB in size. Because the max chunk
size is currently 64 MB , you'd soon see this chunk split if you were to continue insert-
ing data.
Another thing to notice is that the distribution still looks pretty much even, as it
was before:
> db.chunks.count({"shard": "shard-a"})
11
> db.chunks.count({"shard": "shard-b"})
10
Although the number of chunks has increased during the last 800 MB insert round,
you can probably assume that no migrates occurred; a likely scenario is that each of
the original chunks split in two, with a single extra split somewhere in the mix. You
can verify this by querying the config database's changelog collection:
> db.changelog.count({what: "split"})
20
> db.changelog.find({what: "moveChunk.commit"}).count()
6
 
Search WWH ::




Custom Search