Databases Reference
In-Depth Information
Incrementing Shard Keys Versus Random Shard Keys
The distribution of inserts across shards is very dependent on which key we're
sharding on.
If we choose to shard on something like "timestamp" , where the value is probably going
to increase and not jump around a lot, we'll be sending all of the inserts to one shard
(the one with the [June 27, 2003, ∞] chunk). Notice that, if we add a new shard and it
splits the data again, we'll still be inserting on just one server. If we add a new shard,
MongoDB might split [June 27, 2003, ∞] into [June 27, 2003, December 12, 2010) and
[December 12, 2010, ∞]. We'll always have a chunk that will be “some date through
infinity,” which is where our inserts will be going. This isn't good for a very high write
load, but it will make queries on the shard key very efficient.
If we have a high write load and want to evenly distribute writes across multiple shards,
we should pick a shard key that will jump around more. This could be a hash of the
timestamp in the log example or a key like "logMessage" , which won't have any par-
ticular pattern to it.
Whether your shard key jumps around or increases steadily, it is important to choose
a key that will vary somewhat. If, for example, we had a "logLevel" key that had only
values "DEBUG" , "WARN" , or "ERROR" , MongoDB won't be able to break up your data into
more than three chunks (because there are only three different values). If you have a
key with very little variation and want to use it as a shard key anyway, you can do so
by creating a compound shard key on that key and a key that varies more, like
"logLevel" and "timestamp" .
Determining which key to shard on and creating shard keys should be reminiscent of
indexing, because the two concepts are similar. In fact, often your shard key will just
be the index you use most often.
How Shard Keys Affect Operations
To the end user, a sharded setup should be indistinguishable from a nonsharded one.
However, it can be useful, especially when setting up sharding, to understand how
different queries will be done depending on the shard key chosen.
Suppose we have the collection described in the previous section, which is sharded on
the "name" key and has three shards with names ranging from A to Z. Different queries
will be executed in different ways:
db.people.find({"name" : "Susan"})
mongos will send this query directly to the Q-Z shard, receive a response from that
shard, and send it to the client.
db.people.find({"name" : {"$lt" : "L"}})
mongos will send the query to the A-F and G-P shards in serial. It will forward their
responses to the client.
 
Search WWH ::




Custom Search