Databases Reference
In-Depth Information
MapReduce as a background job, it creates a collection of results, and then you can
query that collection in real time.
We'll go through a couple of MapReduce examples because it is incredibly useful and
powerful but also a somewhat complex tool.
Example 1: Finding All Keys in a Collection
Using MapReduce for this problem might be overkill, but it is a good way to get familiar
with how MapReduce works. If you already understand MapReduce, feel free to skip
ahead to the last part of this section, where we cover MongoDB-specific MapReduce
considerations.
MongoDB is schemaless, so it does not keep track of the keys in each document. The
best way, in general, to find all the keys across all the documents in a collection is to
use MapReduce. In this example, we'll also get a count of how many times each key
appears in the collection. This example doesn't include keys for embedded documents,
but it would be a simple addition to the map function to do so.
For the mapping step, we want to get every key of every document in the collection.
The map function uses a special function to “return” values that we want to process
later: emit . emit gives MapReduce a key (like the one used by group earlier) and a value.
In this case, we emit a count of how many times a given key appeared in the document
(once: {count : 1} ). We want a separate count for each key, so we'll call emit for every
key in the document. this is a reference to the current document we are mapping:
> map = function() {
... for (var key in this) {
... emit(key, {count : 1});
... }};
Now we have a ton of little {count : 1} documents floating around, each associated
with a key from the collection. An array of one or more of these {count : 1} documents
will be passed to the reduce function. The reduce function is passed two arguments:
key , which is the first argument from emit , and an array of one or more {count : 1}
documents that were emitted for that key:
> reduce = function(key, emits) {
... total = 0;
... for (var i in emits) {
... total += emits[i].count;
... }
... return {"count" : total};
... }
reduce must be able to be called repeatedly on results from either the map phase or
previous reduce phases. Therefore, reduce must return a document that can be re-sent
to reduce as an element of its second argument. For example, say we have the key x
mapped to three documents: {count : 1, id : 1} , {count : 1, id : 2} , and {count :
 
Search WWH ::




Custom Search