How MapReduce Works - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

There is a good case for turning off speculative execution for reduce tasks, since any du-

plicate reduce tasks have to fetch the same map outputs as the original task, and this can

significantly increase network traffic on the cluster.

Another reason for turning off speculative execution is for nonidempotent tasks. However,

in many cases it is possible to write tasks to be idempotent and use an OutputCommit-

ter to promote the output to its final location when the task succeeds. This technique is

explained in more detail in the next section.

Output Committers

Hadoop MapReduce uses a commit protocol to ensure that jobs and tasks either succeed

or fail cleanly. The behavior is implemented by the OutputCommitter in use for the

job, which is set in the old MapReduce API by calling the setOutputCommitter()

on JobConf or by setting mapred.output.committer.class in the configura-

tion. In the new MapReduce API, the OutputCommitter is determined by the Out-

putFormat , via its getOutputCommitter() method. The default is FileOut-

putCommitter , which is appropriate for file-based MapReduce. You can customize an

existing OutputCommitter or even write a new implementation if you need to do spe-

cial setup or cleanup for jobs or tasks.

The OutputCommitter API is as follows (in both the old and new MapReduce APIs):

public abstract class OutputCommitter {

public abstract void setupJob ( JobContext jobContext ) throws

IOException ;

public void commitJob ( JobContext jobContext ) throws IOException { }

public void abortJob ( JobContext jobContext , JobStatus . State state )

throws IOException { }

public abstract void setupTask ( TaskAttemptContext taskContext )

throws IOException ;

public abstract boolean needsTaskCommit ( TaskAttemptContext

taskContext )

throws IOException ;

public abstract void commitTask ( TaskAttemptContext taskContext )

throws IOException ;

public abstract void abortTask ( TaskAttemptContext taskContext )

throws IOException ;

}

Search WWH ::

Custom Search

Home