Database Reference
In-Depth Information
Sharding Concerns
Database performance for these kinds of deployments are dependent on indexes. You may use
sharding to enhance performance by allowing MongoDB to keep larger portions of those in-
dexes in RAM. In sharded configurations, you should select a shard key that allows the server
to route queries directly to a single shard or small group of shards.
Since most of the queries in this system include the type field, it should be included in the
shard key. Beyond this, the remainder of the shard key is difficult to predict without informa-
tion about your database's actual activity and distribution. There are a few things we can say
a priori , however:
details.issue_date would be a poor addition to the shard key because, although it ap-
pears in a number of queries, no query was selective by this field, so mongos would not be
able to route such queries based on the shard key.
▪ It's good to ensure some fields from the detail document that are frequently queried, as
well as some with an even distribution to prevent large unsplittable chunks.
In the following example, we've assumed that the details.genre field is the second-most
queried field after type . To enable sharding on these fields, we'll use the following shard-
collection command:
>>> db . command ( 'shardcollection' , 'dbname.product' , {
... key : { 'type' : 1 , 'details.genre' : 1 , 'sku' : 1 } })
{ "collectionsharded" : "dbname.product" , "ok" : 1 }
Scaling read performance without sharding
While sharding is the best way to scale operations, some data sets make it impossible to parti-
tion data so that mongos can route queries to specific shards. In these situations, mongos sends
the query to all shards and then combines the results before returning to the client.
In these situations, you can gain some additional read performance by allowing mongos to
read from the secondary mongod instances in a replica set by configuring the read preference
in the client. Read preference is configurable on a per-connection or per-operation basis. In
pymongo , this is set using the read_preference keyword argument.
The pymongo.SECONDARY argument in the following example permits reads from the second-
ary (as well as a primary) for the entire connection:
Search WWH ::




Custom Search