Database Reference
In-Depth Information
Figure 7-2. The relationship of the Streaming executable to the node manager and the task container
Progress and Status Updates
MapReduce jobs are long-running batch jobs, taking anything from tens of seconds to
hours to run. Because this can be a significant length of time, it's important for the user to
get feedback on how the job is progressing. A job and each of its tasks have a status ,
which includes such things as the state of the job or task (e.g., running, successfully com-
pleted, failed), the progress of maps and reduces, the values of the job's counters, and a
status message or description (which may be set by user code). These statuses change over
the course of the job, so how do they get communicated back to the client?
When a task is running, it keeps track of its progress (i.e., the proportion of the task com-
pleted). For map tasks, this is the proportion of the input that has been processed. For re-
duce tasks, it's a little more complex, but the system can still estimate the proportion of
the reduce input processed. It does this by dividing the total progress into three parts, cor-
responding to the three phases of the shuffle (see Shuffle and Sort ). For example, if the
task has run the reducer on half its input, the task's progress is 5/6, since it has completed
the copy and sort phases (1/3 each) and is halfway through the reduce phase (1/6).
WHAT CONSTITUTES PROGRESS IN MAPREDUCE?
Progress is not always measurable, but nevertheless, it tells Hadoop that a task is doing something. For
example, a task writing output records is making progress, even when it cannot be expressed as a per-
centage of the total number that will be written (because the latter figure may not be known, even by the
task producing the output).
Progress reporting is important, as Hadoop will not fail a task that's making progress. All of the follow-
ing operations constitute progress:
▪ Reading an input record (in a mapper or reducer)
▪ Writing an output record (in a mapper or reducer)
▪ Setting the status description (via Reporter 's or TaskAttemptContext 's
setStatus() method)
▪ Incrementing a counter (using Reporter 's incrCounter() method or Counter 's in-
crement() method)
▪ Calling Reporter 's or TaskAttemptContext 's progress() method
Tasks also have a set of counters that count various events as the task runs (we saw an ex-
ample in A test run ), which are either built into the framework, such as the number of map
output records written, or defined by users.
Search WWH ::




Custom Search