Databases Reference
In-Depth Information
If you'll be using the temporary collection regularly, you may want to give it a better
name. You can specify a more human-readable name with the out option, which takes
a string. If you specify out , you need not specify keeptemp : true , because it is implied.
Even if you specify a “pretty” name for the collection, MongoDB will use the autogen-
erated collection name for intermediate steps of the MapReduce. When it has finished,
it will automatically and atomically rename the collection from the autogenerated name
to your chosen name. This means that if you run MapReduce multiple times with the
same target collection, you will never be using an incomplete collection for operations.
The output collection created by MapReduce is a normal collection, which means that
there is no problem with doing a MapReduce on it, or a MapReduce on the results from
that MapReduce, ad infinitum!
MapReduce on a subset of documents
Sometimes you need to run MapReduce on only part of a collection. You can add a
query to filter the documents before they are passed to the map function.
Every document passed to the map function needs to be deserialized from BSON into a
JavaScript object, which is a fairly expensive operation. If you know that you will need
to run MapReduce only on a subset of the documents in the collection, adding a filter
can greatly speed up the command. The filter is specified by the "query" , "limit" , and
"sort" keys.
The "query" key takes a query document as a value. Any documents that would ordi-
narily be returned by that query will be passed to the map function. For example, if we
have an application tracking analytics and want a summary for the last week, we can
use MapReduce on only the most recent week's documents with the following
command:
> db.runCommand({"mapreduce" : "analytics", "map" : map, "reduce" : reduce,
"query" : {"date" : {"$gt" : week_ago}}})
The sort option is mostly useful in conjunction with limit . limit can be used on its
own, as well, to simply provide a cutoff on the number of documents sent to the map
function.
If, in the previous example, we wanted an analysis of the last 10,000 page views (instead
of the last week), we could use limit and sort :
> db.runCommand({"mapreduce" : "analytics", "map" : map, "reduce" : reduce,
"limit" : 10000, "sort" : {"date" : -1}})
query , limit , and sort can be used in any combination, but sort isn't useful if limit
isn't present.
Using a scope
MapReduce can take a code type for the map , reduce , and finalize functions, and, in
most languages, you can specify a scope to be passed with code. However, MapReduce
 
Search WWH ::




Custom Search