Database Reference
In-Depth Information
Intermediate files
(on local disk)
Output
files
Input files
Map phase
Reduce phase
Fig. 9.2
An overview of the flow of execution a MapReduce operation
4. Periodically, the buffered pairs are written to local disk and partitioned into R
regions by the partitioning function. The locations of these buffered pairs on the
local disk are passed back to the master, who is responsible for forwarding these
locations to the reduce workers.
5. When a reduce worker is notified by the master about these locations, it reads the
buffered data from the local disks of the map workers which is then sorted by the
intermediate keys so that all occurrences of the same key are grouped together.
The sorting operation is needed because typically many different keys map to the
same reduce task.
6. The reduce worker passes the key and the corresponding set of intermediate
values to the user's Reduce function. The output of the Reduce function is
appended to a final output file for this reduce partition.
7. When all map tasks and reduce tasks have been completed, the master program
wakes up the user program. At this point, the MapReduce invocation in the user
program returns the program control back to the user code.
During the execution process, the master pings every worker periodically. If no
response is received from a worker within a certain amount of time, the master
marks the worker as failed . Any map tasks marked completed or in progress by
the worker are reset back to their initial idle state and therefore become eligible
for scheduling by other workers. Completed map tasks are re-executed on a task
Search WWH ::




Custom Search