How MapReduce Works - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Task Execution

We saw how the MapReduce system executes tasks in the context of the overall job at the

beginning of this chapter, in Anatomy of a MapReduce Job Run . In this section, we'll look

at some more controls that MapReduce users have over task execution.

The Task Execution Environment

Hadoop provides information to a map or reduce task about the environment in which it is

running. For example, a map task can discover the name of the file it is processing (see File

information in the mapper ) , and a map or reduce task can find out the attempt number of

the task. The properties in Table 7-3 can be accessed from the job's configuration, obtained

in the old MapReduce API by providing an implementation of the configure() method

for Mapper or Reducer , where the configuration is passed in as an argument. In the new

API, these properties can be accessed from the context object passed to all methods of the

Mapper or Reducer .

Table 7-3. Task environment properties

Property name

Type

Description Example

String The job ID

(see Job,

Task, and

Task At-

tempt IDs

for a de-

scription of

the format)

mapreduce.job.id

job_200811201130_0004

String The task

ID

mapreduce.task.id

task_200811201130_0004_m_000003

mapreduce.task.attempt.id String The task

attempt ID

attempt_200811201130_0004_m_000003_0

The index

of the task

within the

job

mapreduce.task.partition int

3

boolean Whether

this task is

a map task

mapreduce.task.ismap

true

Hadoop: The Definitive Guide

Search WWH ::

Custom Search

Home