Databases Reference
In-Depth Information
If you'll be using the temporary collection regularly, you may want to give it a better
name. You can specify a more human-readable name with the
out
option, which takes
a string. If you specify
out
, you need not specify
keeptemp : true
, because it is implied.
Even if you specify a “pretty” name for the collection, MongoDB will use the autogen-
erated collection name for intermediate steps of the MapReduce. When it has finished,
it will automatically and atomically rename the collection from the autogenerated name
to your chosen name. This means that if you run MapReduce multiple times with the
same target collection, you will never be using an incomplete collection for operations.
The output collection created by MapReduce is a normal collection, which means that
there is no problem with doing a MapReduce on it, or a MapReduce on the results from
that MapReduce, ad infinitum!
MapReduce on a subset of documents
Sometimes you need to run MapReduce on only part of a collection. You can add a
query to filter the documents before they are passed to the
map
function.
Every document passed to the
map
function needs to be deserialized from BSON into a
JavaScript object, which is a fairly expensive operation. If you know that you will need
to run MapReduce only on a subset of the documents in the collection, adding a filter
can greatly speed up the command. The filter is specified by the
"query"
,
"limit"
, and
"sort"
keys.
The
"query"
key takes a query document as a value. Any documents that would ordi-
narily be returned by that query will be passed to the
map
function. For example, if we
have an application tracking analytics and want a summary for the last week, we can
use MapReduce on only the most recent week's documents with the following
command:
> db.runCommand({"mapreduce" : "analytics", "map" : map, "reduce" : reduce,
"query" : {"date" : {"$gt" : week_ago}}})
The
sort
option is mostly useful in conjunction with
limit
.
limit
can be used on its
own, as well, to simply provide a cutoff on the number of documents sent to the
map
function.
If, in the previous example, we wanted an analysis of the last 10,000 page views (instead
of the last week), we could use
limit
and
sort
:
> db.runCommand({"mapreduce" : "analytics", "map" : map, "reduce" : reduce,
"limit" : 10000, "sort" : {"date" : -1}})
query
,
limit
, and
sort
can be used in any combination, but
sort
isn't useful if
limit
isn't present.
Using a scope
MapReduce can take a code type for the
map
,
reduce
, and
finalize
functions, and, in
most languages, you can specify a scope to be passed with code. However, MapReduce