Database Reference
In-Depth Information
Listing 3.7
MapReduce function that lists the most mentioned users
>
/
*
>
*
This function extracts each user mentioned,
>
*
and the count of each mention.
>
*
The function takes 0 parameters, as the document
>
*
will be passed through context (the 'this' object).
>
*
/
> var mapFunction = function(){
...
//loop through all of the mentions in the document.
...
var userMentions =
this
.entities.user_mentions;
...
for(var i = 0; i < userMentions.length; i++){
...
//check that the username is not blank.
...
if(userMentions[i].screen_name.length > 0){
...
//emit the username (key) and
...
//the count (value, in this case always 1).
...
emit(userMentions[i].screen_name, 1);
...
}
...
}
... }
>
/
*
>
*
This function sums the number of mentions of each user
>
*
/
> var reduceFunction = function(keyUsername, occurs){
...
return Array.sum(occurs);
... }
>
// Perform the MapReduce operation, and store the results
>
// in a new collection, "most_mentioned_users".
> db.tweets.mapReduce(mapFunction, reduceFunction, {
"out"
:
"
most_mentioned_users"
});
>
// List the top 5 most-mentioned users
> db.most_mentioned_users.find().sort({
"value"
: -1}).limit(5)
{
"_id"
:
"MikeBloomberg"
,
"value"
: 727 }
{
"_id"
:
"OccupyWallSt"
,
"value"
: 588 }
{
"_id"
:
"OccupyWallStNYC"
,
"value"
: 428 }
{
"_id"
:
"JoshHarkinson"
,
"value"
: 295 }
{
"_id"
:
"ydanis"
,
"value"
: 260 }
Source: Chapter3/mapreduce.js
In Listing
3.7
, the MapReduce is constructed as follows. The map function, called
mapFunction
, looks at each individual Tweet and pulls out the mentioned users.
It then constructs the key/value pair to be sent to the reducer. The key is the user
that was mentioned, and the value is 1. MongoDB then creates a unique reducer for
each unique key and calls the reduce function,
reduceFunction
, on each key.
The reducer then takes this list of values and calculates the sum. The result is a list
of mentioned users and the count of the number of mentions for that user.