Database Reference
In-Depth Information
There is a good case for turning off speculative execution for reduce tasks, since any du-
plicate reduce tasks have to fetch the same map outputs as the original task, and this can
significantly increase network traffic on the cluster.
Another reason for turning off speculative execution is for nonidempotent tasks. However,
in many cases it is possible to write tasks to be idempotent and use an OutputCommit-
ter to promote the output to its final location when the task succeeds. This technique is
explained in more detail in the next section.
Output Committers
Hadoop MapReduce uses a commit protocol to ensure that jobs and tasks either succeed
or fail cleanly. The behavior is implemented by the OutputCommitter in use for the
job, which is set in the old MapReduce API by calling the setOutputCommitter()
on JobConf or by setting mapred.output.committer.class in the configura-
tion. In the new MapReduce API, the OutputCommitter is determined by the Out-
putFormat , via its getOutputCommitter() method. The default is FileOut-
putCommitter , which is appropriate for file-based MapReduce. You can customize an
existing OutputCommitter or even write a new implementation if you need to do spe-
cial setup or cleanup for jobs or tasks.
The OutputCommitter API is as follows (in both the old and new MapReduce APIs):
public abstract class OutputCommitter {
public abstract void setupJob ( JobContext jobContext ) throws
IOException ;
public void commitJob ( JobContext jobContext ) throws IOException { }
public void abortJob ( JobContext jobContext , JobStatus . State state )
throws IOException { }
public abstract void setupTask ( TaskAttemptContext taskContext )
throws IOException ;
public abstract boolean needsTaskCommit ( TaskAttemptContext
taskContext )
throws IOException ;
public abstract void commitTask ( TaskAttemptContext taskContext )
throws IOException ;
public abstract void abortTask ( TaskAttemptContext taskContext )
throws IOException ;
}
}
Search WWH ::




Custom Search