Operational Intelligence - MongoDB Applied Design Patterns

Database Reference

In-Depth Information

total : 254 ,

count : 10 ,

mean : 25.4 }

}

MapReduce

The MapReduce algorithm and its MongoDB implementation, the mapreduce command, is a

popular way to process large amounts of data in bulk. If you're not familiar with MapReduce,

the basics are illustrated in the following pseudocode:

from

from collections

collections import

import defaultdict

def

def map_reduce ( input , output , query , mapf , reducef , finalizef ):

# Map phase

map_output = []

for

for doc iin input . find ( output ):

map_output += mapf ( doc )

# Shuffle phase

map_output . sort ()

docs_by_key = groupby_keys ( map_output )

# Reduce phase

reduce_output = []

for

for key , values iin docs_by_key :

reduce_output . append ({

'_id' : key ,

'value' : reducef ( key , values ) })

# Finalize phase

finalize_output = []

for

for doc iin reduce_output :

key , value = doc [ '_id' ], doc [ 'value' ]

reduce_output [ key ] = finalizef ( key , value )

output . remove ()

output . insert ( finalize_output )

In MongoDB, mapf actually calls an emit function to generate zero or more docu-

ments to feed into the next phase. The signature of the mapf function is also modi-

fied to take no arguments, passing the document in the this JavaScript keyword.

Search WWH ::

Custom Search

Home