Queries and aggregation - MongoDB in Action

Database Reference

In-Depth Information

 sort —A sort to be applied to the query. This is most useful when used in con-

junction with the limit option. That way, you could run map-reduce on the

1,000 most-recently-created documents.

 limit —An integer specifying a limit to be applied to the query and sort.

 out —This parameter determines how the output is returned. To return all out-

put as the result of the command itself, pass {inline: 1} as the value. Note that

this works only when the result set fits within the 16 MB return limit.

The other option is to place the results into an output collection. To do this,

the value of out must be a string identifying the name of the collection where

the results are to be stored.

One problem with writing to an output collection is that you may overwrite

existing data if you've recently run a similar map-reduce. Therefore, two other

collection output options exist: one for merging the results with the old data

and another for reducing against the data. In the merge case, notated as

{merge: "collectionName"} , the new results will overwrite any existing items

having the same key. In the reduce case, {reduce: "collectionName"} , existing

keys' values will be reduced against new values using the reduce function. The

reduce output method is especially helpful for performing iterative map-

reduce, where you want to integrate new data into an existing aggregation.

When you run the new map-reduce against the collection, you simply add a

query selector to limit the data set over which the aggregation is run.

 finalize —A JavaScript function to be applied to each resulting document

after the reduce phase is complete.

 scope —A document that specifies values for variables to be globally accessible

by the map , reduce , and finalize functions.

 verbose —A Boolean that, when true, will include in the command's return

document statistics on the execution time of the map-reduce job.

Alas, there's one important limitation to be aware of when thinking about MongoDB's

map-reduce and group : speed. On large data sets, these aggregation functions often

won't perform as quickly as some users may need. This can be blamed almost entirely

on the MongoDB's JavaScript engine. It's hard to achieve high performance with a

JavaScript engine that runs single-threaded and interpreted (not compiled).

But despair not. map-reduce and group are widely used and adequate in a lot of sit-

uations. For those cases when they're not, an alternative and a hope for the future

exist. The alternative is to run aggregations elsewhere. Users with especially large data

sets have experienced great success running the data through a Hadoop cluster. The

hope for the future is a newer set of aggregation functions that use compiled, multi-

threaded code. These are planned to be released some time after MongoDB v2.0; you

can track progress at https://jira.mongodb.org/browse/ SERVER -447 .

Search WWH ::

Custom Search

Home