Operational Intelligence - MongoDB Applied Design Patterns

Database Reference

In-Depth Information

In sharded environments, the performance of aggregation operations depends on the shard

key. Ideally, all the items in a particular $group operation will reside on the same server.

Although this distribution of documents would occur if you chose the time field as the shard

key, a field like path also has this property and is a typical choice for sharding. See Sharding

Concerns for additional recommendations concerning sharding.

SQL EQUIVALENTS

To translate statements from the aggregation framework to SQL, you can consider the

$match equivalent to WHERE , $project to SELECT , and $group to GROUP BY .

In order to optimize the aggregation operation, you must ensure that the initial $match query

has an index. In this case, the command would be simple, and it's an index we already have:

>>>

>>> db . events . ensure_index ( 'time' )

If you have already created a compound index on the time and host (i.e., { time: 1, host,

1 } ,) MongoDB will use this index for range queries on just the time field. In situations like

this, there's no benefit to creating an additional index for just time .

Sharding Concerns

Eventually, your system's events will exceed the capacity of a single event logging database

instance. In these situations you will want to use a shard cluster , which takes advantage of

MongoDB's automatic sharding functionality. In this section, we introduce the unique shard-

ing concerns for the event logging use case.

Limitations

In a sharded environment, the limitations on the maximum insertion rate are:

▪ The number of shards in the cluster

▪ The shard key you choose

Search WWH ::

Custom Search

Home