Database Reference
In-Depth Information
5.4.4
Map-reduce
You may be wondering why MongoDB would support both group and map-reduce ,
since they provide such similar functionality. In fact, group preceded map-reduce as
MongoDB's sole aggregator. map-reduce was added later on for a couple of related
reasons. First, operations in the map-reduce style were becoming more mainstream,
and it seemed prudent to integrate this budding way of thinking into the product. 11
The second reason was much more practical: iterating over large data sets, especially
in a sharded configuration, required a distributed aggregator. Map-reduce (the para-
digm) provided just that.
map-reduce includes many options. Here they are in all their byzantine detail:
map —A JavaScript function to be applied to each document. This function must
call emit() to select the keys and values to be aggregated. Within the function
context, the value of this is a reference to the current document. So, for exam-
ple, if you wanted to group your results by user ID and produce totals on a vote
count and document count, then your map function would look like this:
function() {
emit(this.user_id, {vote_sum: this.vote_count, doc_count: 1});
}
reduce —A JavaScript function that receives a key and a list of values. This func-
tion must always return a value having the same structure as each of the values
provided in the values array. A reduce function typically iterates over the list of
values and aggregates them in the process. Sticking to our example, here's how
you'd reduce the mapped values:
function(key, values) {
var vote_sum = 0;
var doc_sum = 0;
values.forEach(function(value) {
vote_sum += value.vote_sum;
doc_sum += value.doc_sum;
});
return {vote_sum: vote_sum, doc_sum: doc_sum};
}
Note that the value of the key parameter frequently isn't used in the aggrega-
tion itself.
query —A query selector that filters the collection to be mapped. This parame-
ter serves the same function as group 's cond parameter.
11
A lot of developers first saw map-reduce in a famous paper by Google on distributed computations (http://
labs.google.com/papers/mapreduce.html). The ideas in this paper helped form the basis for Hadoop, an
open source framework that uses distributed map-reduce to process large data sets. The map-reduce idea then
spread. CouchDB, for instance, employed a map-reduce paradigm for declaring indexes.
Search WWH ::




Custom Search