Database Reference
In-Depth Information
total
:
254
,
count
:
10
,
mean
:
25.4
}
}
MapReduce
The MapReduce algorithm and its MongoDB implementation, the
mapreduce
command, is a
popular way to process large amounts of data in bulk. If you're not familiar with MapReduce,
the basics are illustrated in the following pseudocode:
from
from
collections
collections
import
import
defaultdict
def
def
map_reduce
(
input
,
output
,
query
,
mapf
,
reducef
,
finalizef
):
# Map phase
map_output
=
[]
for
for
doc
iin
input
.
find
(
output
):
map_output
+=
mapf
(
doc
)
# Shuffle phase
map_output
.
sort
()
docs_by_key
=
groupby_keys
(
map_output
)
# Reduce phase
reduce_output
=
[]
for
for
key
,
values
iin
docs_by_key
:
reduce_output
.
append
({
'_id'
:
key
,
'value'
:
reducef
(
key
,
values
) })
# Finalize phase
finalize_output
=
[]
for
for
doc
iin
reduce_output
:
key
,
value
=
doc
[
'_id'
],
doc
[
'value'
]
reduce_output
[
key
]
=
finalizef
(
key
,
value
)
output
.
remove
()
output
.
insert
(
finalize_output
)
In MongoDB,
mapf
actually calls an
emit
function to generate zero or more docu-
ments to feed into the next phase. The signature of the
mapf
function is also modi-
fied to take no arguments, passing the document in the
this
JavaScript keyword.