Database Reference
In-Depth Information
In sharded environments, the performance of aggregation operations depends on the shard
key. Ideally, all the items in a particular
$group
operation will reside on the same server.
Although this distribution of documents would occur if you chose the
time
field as the shard
Concerns
for additional recommendations concerning sharding.
SQL EQUIVALENTS
To translate statements from the aggregation framework to SQL, you can consider the
$match
equivalent to
WHERE
,
$project
to
SELECT
, and
$group
to
GROUP BY
.
In order to optimize the aggregation operation, you must ensure that the initial
$match
query
has an index. In this case, the command would be simple, and it's an index we already have:
>>>
>>>
db
.
events
.
ensure_index
(
'time'
)
If you have already created a compound index on the
time
and
host
(i.e.,
{ time: 1, host,
1 }
,) MongoDB will use this index for range queries on just the
time
field. In situations like
this, there's no benefit to creating an additional index for just
time
.
Sharding Concerns
Eventually, your system's events will exceed the capacity of a single event logging database
instance. In these situations you will want to use a
shard cluster
, which takes advantage of
MongoDB's automatic sharding functionality. In this section, we introduce the unique shard-
ing concerns for the event logging use case.
Limitations
In a sharded environment, the limitations on the maximum insertion rate are:
▪ The number of shards in the cluster
▪ The shard key you choose