Database Reference
In-Depth Information
If you're going to issue this query in a production situation, you'll ideally have an
index on
helpful_votes
. If you want the review with the greatest number of helpful
votes for a particular product, you'll want a compound index on
product_id
and
helpful_votes
. If the reason for this isn't clear, refer to chapter 7.
5.4.2
Distinct
MongoDB's
distinct
command is the simplest tool for getting a list of distinct values
for a particular key. The command works for both single keys and array keys.
distinct
covers an entire collection by default but can be constrained with a query
selector.
You can use
distinct
to get a list of all the unique tags on the products collection
as follows:
db.products.distinct("tags")
It's that simple. If you want to operate on a subset of the
products
collection, you can
pass a query selector as the second argument. Here, the query limits the distinct tag
values to products in the Gardening Tools category:
db.products.distinct("tags",
{category_id: ObjectId("6a5b1476238d3b4dd5000048")})
AGGREGATION COMMAND LIMITATIONS
For all their practicality,
distinct
and
group
suffer from a significant limitation: they can't return a result set greater
than 16
MB
. The 16
MB
limit isn't a threshold imposed on these commands
per se
but rather on all initial query result sets.
distinct
and
group
are imple-
mented as commands, which means that they're implemented as queries on
the special
$cmd
collection, and their being queries is what subjects them to
this limitation. If
distinct
or
group
can't handle your aggregation result size,
then you'll want to use
map-reduce
instead, where the results can be stored in
a collection rather than being returned inline.
5.4.3
Group
group
, like
distinct
, is a database command, and thus its result set is subject to the
same 16
MB
response limitation. Furthermore, to reduce memory consumption,
group
won't process more than 10,000 unique keys. If your aggregate operation fits
within these bounds, then
group
can be a good choice because it's frequently faster
than
map-reduce
.
You've already seen a semi-involved example of grouping reviews by user. Let's
quickly review the options to be passed to
group
:
key
—A document describing the fields to group by. For instance, to group by
category_id
, you'd use
{category_id:
true}
as your key. The key can also be
compound. For instance, if you wanted to group a series of posts by
user_id
and
rating
, your key would look like this:
{user_id:
true,
rating:
true}
.
The
key
option is required unless you're using
keyf
.