Database Reference
In-Depth Information
Figure 2-5. MapReduce data flow with no reduce tasks
The contract for the combiner function constrains the type of function that may be used.
This is best illustrated with an example. Suppose that for the maximum temperature ex-
ample, readings for the year 1950 were processed by two maps (because they were in dif-
ferent splits). Imagine the first map produced the output:
(1950, 0)
(1950, 20)
(1950, 10)
and the second produced:
(1950, 25)
(1950, 15)
The reduce function would be called with a list of all the values:
(1950, [0, 20, 10, 25, 15])
with output:
(1950, 25)
since 25 is the maximum value in the list. We could use a combiner function that, just like
the reduce function, finds the maximum temperature for each map output. The reduce
function would then be called with:
(1950, [20, 25])
and would produce the same output as before. More succinctly, we may express the func-
tion calls on the temperature values in this case as follows:
max(0, 20, 10, 25, 15) = max(max(0, 20, 10), max(25, 15)) = max(20,
25) = 25
Not all functions possess this property. [ 20 ] For example, if we were calculating mean tem-
peratures, we couldn't use the mean as our combiner function, because:
Search WWH ::




Custom Search