Database Reference
In-Depth Information
total : 254 ,
count : 10 ,
mean : 25.4 }
}
MapReduce
The MapReduce algorithm and its MongoDB implementation, the mapreduce command, is a
popular way to process large amounts of data in bulk. If you're not familiar with MapReduce,
the basics are illustrated in the following pseudocode:
from
from collections
collections import
import defaultdict
def
def map_reduce ( input , output , query , mapf , reducef , finalizef ):
# Map phase
map_output = []
for
for doc iin input . find ( output ):
map_output += mapf ( doc )
# Shuffle phase
map_output . sort ()
docs_by_key = groupby_keys ( map_output )
# Reduce phase
reduce_output = []
for
for key , values iin docs_by_key :
reduce_output . append ({
'_id' : key ,
'value' : reducef ( key , values ) })
# Finalize phase
finalize_output = []
for
for doc iin reduce_output :
key , value = doc [ '_id' ], doc [ 'value' ]
reduce_output [ key ] = finalizef ( key , value )
output . remove ()
output . insert ( finalize_output )
In MongoDB, mapf actually calls an emit function to generate zero or more docu-
ments to feed into the next phase. The signature of the mapf function is also modi-
fied to take no arguments, passing the document in the this JavaScript keyword.
 
 
 
 
 
Search WWH ::




Custom Search