Database Reference
In-Depth Information
S
KIP
AND
LIMIT
There's nothing mysterious about the semantics of
skip
and
limit
. These query
options should always work as you expect.
But you should beware of passing large values (say, values greater than 10,000) for
skip
because serving such queries requires scanning over a number of documents
equal to the
skip
value. For example, imagine that you're paginating a million docu-
ments sorted by date, descending, with 10 results per page. This means that the query
to display the 50,000th page will include a
skip
value of 500,000, which is incredibly
inefficient. A better strategy is to omit the
skip
altogether and instead add a range con-
dition to the query that indicates where the next result set begins. Thus, this query
db.docs.find({}).skip(500000).limit(10).sort({date: -1})
becomes this:
db.docs.find({date: {$gt: previous_page_date}}).limit(10).sort({date: -1})
This second query will scan far fewer items than the first. The only potential problem
is that if
date
isn't unique for each document, the same document may be displayed
more than once. There are many strategies for dealing with this, but the solutions are
left as exercises for the reader.
5.3
Aggregating orders
You've already seen a basic example of MongoDB's aggregation in the
count
com-
mand, which you used for pagination. Most databases provide
count
plus a lot of
other built-in aggregation functions for calculating sums, averages, variances, and the
like. These features are on the MongoDB roadmap, but until they're implemented,
you can use
group
and
map-reduce
to script any aggregate function, from simple sums
to standard deviations.
5.3.1
Grouping reviews by user
It's common to want to know which users provide the most valuable reviews. Since the
application allows users to votes on reviews, it's technically possible to calculate the
total number of votes for all of a user's reviews along with the average number of votes
a user receives per review. Though you could get these stats by querying all reviews
and doing some basic client-side processing, you can also use MongoDB's
group
com-
mand to get the result from the server.
group
takes a minimum of three arguments. The first,
key
, defines how your data
will be grouped. In this case, you want your results to be grouped by user, so your
grouping key is
user_id
. The second argument, known as the
reduce
function, is a
JavaScript function that aggregates over a result set. The final argument to
group
is an
initial document for the
reduce
function.
This sounds more complicated than it is. To see why, let's look more closely at the
initial document you'll use and at its corresponding
reduce
function:
initial = {review: 0, votes: 0};
reduce = function(doc, aggregator) {