Sharding - MongoDB in Action

Database Reference

In-Depth Information

Given this, it should be clear that sharding a collection at the last minute isn't a

good response to a performance problem. If you plan on sharding a collection at

some point in the future, you should do so well in advance of any anticipated perfor-

mance degradation.

Presplitting chunks for initial load

If you have a large data set that you need to load into a sharded collection, and you

know something about the distribution of the data, then you can save a lot of time by

presplitting and then premigrating chunks. For example, suppose you wanted to

import the spreadsheet data into a fresh MongoDB shard cluster. You can ensure that

the data distributes evenly upon import by first splitting and then migrating chunks

across shards. You can use the split and moveChunk commands to accomplish this.

These are aliased by the sh.splitAt() and sh.moveChunks() helpers, respectively.

Here's an example of a manual chunk split. You issue the split command, specify

the collection you want, and then indicate a split point:

> sh.splitAt( "cloud-docs.spreadsheets",

{ "username" : "Chen", "_id" : ObjectId("4d6d59db1d41c8536f001453") })

When run, this command will locate the chunk that logically contains the document

where username is Chen and _id is ObjectId("4d6d59db1d41c8536f001453") . 16 It

then splits the chunk at that point, which results in two chunks. You can continue

splitting like this until you have a set of chunks that nicely distribute the data. You'll

want to make sure that you've created enough chunks to keep the average chunk size

well within the 64 MB split threshold. Thus if you expect to load 1 GB of data, you

should plan to create around 20 chunks.

The second step is to ensure that all shards have roughly the same number of

chunks. Since all chunks will initially reside on the same shard, you'll need to move

them. Each chunk can be moved using the moveChunk command. The helper method

simplifies this:

> sh.moveChunk("cloud-docs.spreadsheets", {username: "Chen"}, "shardB")

This says that you want to move the chunk that logically would contain the document

{username: "Chen"} to shard B.

9.5.2

Administration

I'll round out this chapter with a few words about sharding administration.

M ONITORING

A shard cluster is a complex piece of machinery; as such, you should monitor it

closely. The serverStatus and currentOp() commands can be run on any mongos ,

and their output will reflect aggregate statistics across shards. I'll discuss these com-

mands in more detail in the next chapter.

16

Note that such a document need not exist. That should be clear from the fact that you're splitting chunks on

an empty collection.

Search WWH ::

Custom Search

Home