Databases Reference
In-Depth Information
User
Program
fork
fork
fork
Master
assign
assign
Reduce
Map
Worker
Worker
Worker
Worker
Input
Worker
Data
Output
File
Intermediate
Files
Figure 2.3: Overview of the execution of a map-reduce program
file(s), but we may wish to create fewer Reduce tasks. The reason for limiting
the number of Reduce tasks is that it is necessary for each Map task to create
an intermediate file for each Reduce task, and if there are too many Reduce
tasks the number of intermediate files explodes.
The Master keeps track of the status of each Map and Reduce task (idle,
executing at a particular Worker, or completed). A Worker process reports to
the Master when it finishes a task, and a new task is scheduled by the Master
for that Worker process.
Each Map task is assigned one or more chunks of the input file(s) and
executes on it the code written by the user. The Map task creates a file for
each Reduce task on the local disk of the Worker that executes the Map task.
The Master is informed of the location and sizes of each of these files, and the
Reduce task for which each is destined. When a Reduce task is assigned by the
Master to a Worker process, that task is given all the files that form its input.
The Reduce task executes code written by the user and writes its output to a
file that is part of the surrounding distributed file system.
2.2.6
Coping With Node Failures
The worst thing that can happen is that the compute node at which the Master
is executing fails. In this case, the entire map-reduce job must be restarted.
But only this one node can bring the entire process down; other failures will be
Search WWH ::




Custom Search