Database Reference
In-Depth Information
Given this, it should be clear that sharding a collection at the last minute isn't a
good response to a performance problem. If you plan on sharding a collection at
some point in the future, you should do so well in advance of any anticipated perfor-
mance degradation.
Presplitting chunks for initial load
If you have a large data set that you need to load into a sharded collection, and you
know something about the distribution of the data, then you can save a lot of time by
presplitting and then premigrating chunks. For example, suppose you wanted to
import the spreadsheet data into a fresh MongoDB shard cluster. You can ensure that
the data distributes evenly upon import by first splitting and then migrating chunks
across shards. You can use the split and moveChunk commands to accomplish this.
These are aliased by the sh.splitAt() and sh.moveChunks() helpers, respectively.
Here's an example of a manual chunk split. You issue the split command, specify
the collection you want, and then indicate a split point:
> sh.splitAt( "cloud-docs.spreadsheets",
{ "username" : "Chen", "_id" : ObjectId("4d6d59db1d41c8536f001453") })
When run, this command will locate the chunk that logically contains the document
where username is Chen and _id is ObjectId("4d6d59db1d41c8536f001453") . 16 It
then splits the chunk at that point, which results in two chunks. You can continue
splitting like this until you have a set of chunks that nicely distribute the data. You'll
want to make sure that you've created enough chunks to keep the average chunk size
well within the 64 MB split threshold. Thus if you expect to load 1 GB of data, you
should plan to create around 20 chunks.
The second step is to ensure that all shards have roughly the same number of
chunks. Since all chunks will initially reside on the same shard, you'll need to move
them. Each chunk can be moved using the moveChunk command. The helper method
simplifies this:
> sh.moveChunk("cloud-docs.spreadsheets", {username: "Chen"}, "shardB")
This says that you want to move the chunk that logically would contain the document
{username: "Chen"} to shard B.
9.5.2
Administration
I'll round out this chapter with a few words about sharding administration.
M ONITORING
A shard cluster is a complex piece of machinery; as such, you should monitor it
closely. The serverStatus and currentOp() commands can be run on any mongos ,
and their output will reflect aggregate statistics across shards. I'll discuss these com-
mands in more detail in the next chapter.
16
Note that such a document need not exist. That should be clear from the fact that you're splitting chunks on
an empty collection.
Search WWH ::




Custom Search