Database Reference
In-Depth Information
Task Execution
We saw how the MapReduce system executes tasks in the context of the overall job at the
beginning of this chapter, in
Anatomy of a MapReduce Job Run
.
In this section, we'll look
at some more controls that MapReduce users have over task execution.
The Task Execution Environment
Hadoop provides information to a map or reduce task about the environment in which it is
running. For example, a map task can discover the name of the file it is processing (see
File
information in the mapper
)
, and a map or reduce task can find out the attempt number of
the task. The properties in
Table 7-3
can be accessed from the job's configuration, obtained
in the old MapReduce API by providing an implementation of the
configure()
method
for
Mapper
or
Reducer
, where the configuration is passed in as an argument. In the new
API, these properties can be accessed from the context object passed to all methods of the
Mapper
or
Reducer
.
Table 7-3. Task environment properties
Property name
Type
Description Example
mapreduce.job.id
job_200811201130_0004
String
The task
ID
mapreduce.task.id
task_200811201130_0004_m_000003
mapreduce.task.attempt.id String
The task
attempt ID
attempt_200811201130_0004_m_000003_0
The index
of the task
within the
job
mapreduce.task.partition int
3
boolean
Whether
this task is
a map task
mapreduce.task.ismap
true