Sharding - MongoDB: The Definitive Guide

Databases Reference

In-Depth Information

Incrementing Shard Keys Versus Random Shard Keys

The distribution of inserts across shards is very dependent on which key we're

sharding on.

If we choose to shard on something like "timestamp" , where the value is probably going

to increase and not jump around a lot, we'll be sending all of the inserts to one shard

(the one with the [June 27, 2003, ∞] chunk). Notice that, if we add a new shard and it

splits the data again, we'll still be inserting on just one server. If we add a new shard,

MongoDB might split [June 27, 2003, ∞] into [June 27, 2003, December 12, 2010) and

[December 12, 2010, ∞]. We'll always have a chunk that will be “some date through

infinity,” which is where our inserts will be going. This isn't good for a very high write

load, but it will make queries on the shard key very efficient.

If we have a high write load and want to evenly distribute writes across multiple shards,

we should pick a shard key that will jump around more. This could be a hash of the

timestamp in the log example or a key like "logMessage" , which won't have any par-

ticular pattern to it.

Whether your shard key jumps around or increases steadily, it is important to choose

a key that will vary somewhat. If, for example, we had a "logLevel" key that had only

values "DEBUG" , "WARN" , or "ERROR" , MongoDB won't be able to break up your data into

more than three chunks (because there are only three different values). If you have a

key with very little variation and want to use it as a shard key anyway, you can do so

by creating a compound shard key on that key and a key that varies more, like

"logLevel" and "timestamp" .

Determining which key to shard on and creating shard keys should be reminiscent of

indexing, because the two concepts are similar. In fact, often your shard key will just

be the index you use most often.

How Shard Keys Affect Operations

To the end user, a sharded setup should be indistinguishable from a nonsharded one.

However, it can be useful, especially when setting up sharding, to understand how

different queries will be done depending on the shard key chosen.

Suppose we have the collection described in the previous section, which is sharded on

the "name" key and has three shards with names ranging from A to Z. Different queries

will be executed in different ways:

db.people.find({"name" : "Susan"})

mongos will send this query directly to the Q-Z shard, receive a response from that

shard, and send it to the client.

db.people.find({"name" : {"$lt" : "L"}})

mongos will send the query to the A-F and G-P shards in serial. It will forward their

responses to the client.

Search WWH ::

Custom Search

Home