Queries and aggregation - MongoDB in Action

Database Reference

In-Depth Information

5.4.4

Map-reduce

You may be wondering why MongoDB would support both group and map-reduce ,

since they provide such similar functionality. In fact, group preceded map-reduce as

MongoDB's sole aggregator. map-reduce was added later on for a couple of related

reasons. First, operations in the map-reduce style were becoming more mainstream,

and it seemed prudent to integrate this budding way of thinking into the product. 11

The second reason was much more practical: iterating over large data sets, especially

in a sharded configuration, required a distributed aggregator. Map-reduce (the para-

digm) provided just that.

map-reduce includes many options. Here they are in all their byzantine detail:

 map —A JavaScript function to be applied to each document. This function must

call emit() to select the keys and values to be aggregated. Within the function

context, the value of this is a reference to the current document. So, for exam-

ple, if you wanted to group your results by user ID and produce totals on a vote

count and document count, then your map function would look like this:

function() {

emit(this.user_id, {vote_sum: this.vote_count, doc_count: 1});

}

 reduce —A JavaScript function that receives a key and a list of values. This func-

tion must always return a value having the same structure as each of the values

provided in the values array. A reduce function typically iterates over the list of

values and aggregates them in the process. Sticking to our example, here's how

you'd reduce the mapped values:

function(key, values) {

var vote_sum = 0;

var doc_sum = 0;

values.forEach(function(value) {

vote_sum += value.vote_sum;

doc_sum += value.doc_sum;

});

return {vote_sum: vote_sum, doc_sum: doc_sum};

}

Note that the value of the key parameter frequently isn't used in the aggrega-

tion itself.

 query —A query selector that filters the collection to be mapped. This parame-

ter serves the same function as group 's cond parameter.

11

A lot of developers first saw map-reduce in a famous paper by Google on distributed computations (http://

labs.google.com/papers/mapreduce.html). The ideas in this paper helped form the basis for Hadoop, an

open source framework that uses distributed map-reduce to process large data sets. The map-reduce idea then

spread. CouchDB, for instance, employed a map-reduce paradigm for declaring indexes.

Search WWH ::

Custom Search

Home