Database Reference
In-Depth Information
Sharded collections permit unique indexes on the _id field and on the shard
key only. Unique indexes are prohibited elsewhere because enforcing them
would require intershard communication, which is complicated and still
deemed too slow to be worth implementing.
3
Once you understand how queries are routed and how indexing works, you should be
in a good position to write smart queries and indexes for your sharded cluster. Most all
the advice on indexing and query optimization from chapter 7 will apply, and you have
a powerful explain() tool to use when an empirical investigation proves necessary.
9.4
Choosing a shard key
So much depends upon the right choice of shard key. A poorly chosen shard key will
prevent your application from taking advantage of many of the benefits provided by
the sharded cluster. In the pathological case, both insert and query performance will
be significantly impaired. Adding to the gravity of the decision is that once you've cho-
sen a shard key, you're stuck with it. Shard keys are immutable. 12
Part of having a good experience with sharding is knowing what makes a good
shard key. Because this isn't immediately intuitive, I'll start by describing the kinds of
shard keys that don't work well. This will naturally lead to a discussion of the ones that
do.
9.4.1
Ineffective shard keys
Some shard keys distribute poorly. Others make it impossible to take advantage of the
principle of locality. Still others potentially prevent chunks from splitting. Here we
take a look at the kinds of shard keys that generate these sub-optimal states.
P OOR DISTRIBUTION
The BSON object ID is the default primary key for every MongoDB document. A data
type so close to the heart of the MongoDB would at first appear a promising candidate
for a shard key. Alas, this appearance is deceiving. Recall that the most significant bits
of all object ID s form a timestamp. This means that object ID s are always ascending.
And, unfortunately, ascending values make for terrible shard keys.
To see the problem with ascending shard keys, you need to remember that shard-
ing is range-based. With an ascending shard key, all the most recent inserts will fall
within some narrow continuous range. In sharding terms, this means that these inserts
will be routed to a single chunk, and thus to a single shard. This effectively nullifies
one of sharding's greatest benefits: the automatic distribution of the insert load across
machines. 13 It should be clear that if you want the insert load to be distributed across
shards, you can't use an ascending shard key. You need something more random.
12
Note that there's no good way to alter the shard key once you've created it. Your best bet is to create a new
sharded collection with the proper key, export the data from the old sharded collection, and then restore the
data to the new one.
13
Note that an ascending shard key shouldn't affect updates as long as documents are updated randomly.
Search WWH ::




Custom Search