Database Reference
In-Depth Information
In sharded environments, the performance of aggregation operations depends on the shard
key. Ideally, all the items in a particular $group operation will reside on the same server.
Although this distribution of documents would occur if you chose the time field as the shard
key, a field like path also has this property and is a typical choice for sharding. See Sharding
Concerns for additional recommendations concerning sharding.
SQL EQUIVALENTS
To translate statements from the aggregation framework to SQL, you can consider the
$match equivalent to WHERE , $project to SELECT , and $group to GROUP BY .
In order to optimize the aggregation operation, you must ensure that the initial $match query
has an index. In this case, the command would be simple, and it's an index we already have:
>>>
>>> db . events . ensure_index ( 'time' )
If you have already created a compound index on the time and host (i.e., { time: 1, host,
1 } ,) MongoDB will use this index for range queries on just the time field. In situations like
this, there's no benefit to creating an additional index for just time .
Sharding Concerns
Eventually, your system's events will exceed the capacity of a single event logging database
instance. In these situations you will want to use a shard cluster , which takes advantage of
MongoDB's automatic sharding functionality. In this section, we introduce the unique shard-
ing concerns for the event logging use case.
Limitations
In a sharded environment, the limitations on the maximum insertion rate are:
▪ The number of shards in the cluster
▪ The shard key you choose
Search WWH ::




Custom Search