Database Reference
In-Depth Information
Because MongoDB distributes data using chunks based on ranges of the shard key , the choice
of shard key can control how MongoDB distributes data and the resulting systems' capacity
for writes and queries.
Ideally, your shard key should have two characteristics:
▪ Insertions are balanced between shards
▪ Most queries can be routed to a subset of the shards to be satisfied
Here are some initially appealing options for shard keys, which on closer examination, fail to
meet at least one of these criteria:
Timestamps
Shard keys based on the timestamp or the insertion time (i.e., the ObjectId ) end up all
going in the “high” chunk, and therefore to a single shard. The inserts are not balanced .
Hashes
If the shard key is random, as with a hash, then all queries must be broadcast to all shards.
The queries are not routeable .
We'll now examine these options in more detail.
Option 1: Shard by time
Although using the timestamp, or the ObjectId in the _id field, would distribute your data
evenly among shards, these keys lead to two problems:
▪ All inserts always flow to the same shard, which means that your shard cluster will have
the same write throughput as a standalone instance.
▪ Most reads will tend to cluster on the same shard, assuming you access recent data more
frequently.
Option 2: Shard by a semi-random key
To distribute data more evenly among the shards, you may consider using a more “random”
piece of data, such as a hash of the _id field (i.e., the ObjectId as a shard key).
While this introduces some additional complexity into your application, to generate the key, it
will distribute writes among the shards. In these deployments, having five shards will provide
five times the write capacity as a single instance.
Search WWH ::




Custom Search