Database Reference
In-Depth Information
Task Execution
We saw how the MapReduce system executes tasks in the context of the overall job at the
beginning of this chapter, in Anatomy of a MapReduce Job Run . In this section, we'll look
at some more controls that MapReduce users have over task execution.
The Task Execution Environment
Hadoop provides information to a map or reduce task about the environment in which it is
running. For example, a map task can discover the name of the file it is processing (see File
information in the mapper ) , and a map or reduce task can find out the attempt number of
the task. The properties in Table 7-3 can be accessed from the job's configuration, obtained
in the old MapReduce API by providing an implementation of the configure() method
for Mapper or Reducer , where the configuration is passed in as an argument. In the new
API, these properties can be accessed from the context object passed to all methods of the
Mapper or Reducer .
Table 7-3. Task environment properties
Property name
Type
Description Example
String The job ID
(see Job,
Task, and
Task At-
tempt IDs
for a de-
scription of
the format)
mapreduce.job.id
job_200811201130_0004
String The task
ID
mapreduce.task.id
task_200811201130_0004_m_000003
mapreduce.task.attempt.id String The task
attempt ID
attempt_200811201130_0004_m_000003_0
The index
of the task
within the
job
mapreduce.task.partition int
3
boolean Whether
this task is
a map task
mapreduce.task.ismap
true
Search WWH ::




Custom Search