Operational Intelligence - MongoDB Applied Design Patterns

Database Reference

In-Depth Information

Because MongoDB distributes data using chunks based on ranges of the shard key , the choice

of shard key can control how MongoDB distributes data and the resulting systems' capacity

for writes and queries.

Ideally, your shard key should have two characteristics:

▪ Insertions are balanced between shards

▪ Most queries can be routed to a subset of the shards to be satisfied

Here are some initially appealing options for shard keys, which on closer examination, fail to

meet at least one of these criteria:

Timestamps

Shard keys based on the timestamp or the insertion time (i.e., the ObjectId ) end up all

going in the “high” chunk, and therefore to a single shard. The inserts are not balanced .

Hashes

If the shard key is random, as with a hash, then all queries must be broadcast to all shards.

The queries are not routeable .

We'll now examine these options in more detail.

Option 1: Shard by time

Although using the timestamp, or the ObjectId in the _id field, would distribute your data

evenly among shards, these keys lead to two problems:

▪ All inserts always flow to the same shard, which means that your shard cluster will have

the same write throughput as a standalone instance.

▪ Most reads will tend to cluster on the same shard, assuming you access recent data more

frequently.

Option 2: Shard by a semi-random key

To distribute data more evenly among the shards, you may consider using a more “random”

piece of data, such as a hash of the _id field (i.e., the ObjectId as a shard key).

While this introduces some additional complexity into your application, to generate the key, it

will distribute writes among the shards. In these deployments, having five shards will provide

five times the write capacity as a single instance.

Search WWH ::

Custom Search

Home