Database Reference
In-Depth Information
Setting Up Testing Documents
To begin with, we need to set up some documents to test with. We've created a
mapreduce
collection that is part of the
test
database you restored earlier. If you haven't restored it yet, extract the archive with the following command:
$ tar -xvf test.tgz
x test/
x test/aggregation.bson
x test/aggregation.metadata.json
x test/mapreduce.bson
x test/mapreduce.metadata.json
Then run the
mongorestore
command to restore the
test
database:
$ mongorestore test
connected to: 127.0.0.1
Sun Jul 21 19:26:21.342 test/aggregation.bson
Sun Jul 21 19:26:21.342 going into namespace [test.aggregation]
1000 objects found
Sun Jul 21 19:26:21.350 Creating index: { key: { _id: 1 }, ns: "test.aggregation", name:
"_id_" }
Sun Jul 21 19:26:21.688 test/mapreduce.bson
Sun Jul 21 19:26:21.689 going into namespace [test.mapreduce]
1000 objects found
Sun Jul 21 19:26:21.695 Creating index: { key: { _id: 1 }, ns: "test.mapreduce", name: "_id_" }
This will give you a collection of documents to use in working with MapReduce. To begin, let's look at the world's
simplest map function.
Working with Map functions
This function will “emit” the color and the
num
value from each document in the
mapreduce
collection. These two
fields will be output in key/value form, with the first argument (color) as the key and the second argument (number)
as the value. This is a lot to take in at first, so take a look at the simple
map
function that performs this emit:
var map = function() {
emit(this.color, this.num);
};
In order to run a Map/Reduce we also need a
reduce
function, but before doing anything fancy let's see what's
provided as the result of an empty
reduce
function to get an idea of what happens.
var reduce = function(color, numbers) { };
Enter both these commands into your shell, and you'll have just about all you need to run our MapReduce.
The last thing you will need to provide is an output string for the MapReduce to use. This string defines where the
output for this MapReduce command should be put. The two most common options are
•
To a collection
•
To the console (inline)