How MapReduce Works - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

mapreduce.reduce.maxattempts for reduce tasks. By default, if any task fails

four times (or whatever the maximum number of attempts is configured to), the whole job

fails.

For some applications, it is undesirable to abort the job if a few tasks fail, as it may be

possible to use the results of the job despite some failures. In this case, the maximum per-

centage of tasks that are allowed to fail without triggering job failure can be set for the

job. Map tasks and reduce tasks are controlled independently, using the mapre-

duce.map.failures.maxpercent and mapre-

duce.reduce.failures.maxpercent properties.

A task attempt may also be killed , which is different from it failing. A task attempt may be

killed because it is a speculative duplicate (for more information on this topic, see Specu-

lative Execution ) , or because the node manager it was running on failed and the applica-

tion master marked all the task attempts running on it as killed. Killed task attempts do not

count against the number of attempts to run the task (as set by mapre-

duce.map.maxattempts and mapreduce.reduce.maxattempts ), because it

wasn't the task's fault that an attempt was killed.

Users may also kill or fail task attempts using the web UI or the command line (type

mapred job to see the options). Jobs may be killed by the same mechanisms.

Application Master Failure

Just like MapReduce tasks are given several attempts to succeed (in the face of hardware

or network failures), applications in YARN are retried in the event of failure. The maxim-

um number of attempts to run a MapReduce application master is controlled by the

mapreduce.am.max-attempts property. The default value is 2, so if a MapReduce

application master fails twice it will not be tried again and the job will fail.

YARN imposes a limit for the maximum number of attempts for any YARN application

master running on the cluster, and individual applications may not exceed this limit. The

limit is set by yarn.resourcemanager.am.max-attempts and defaults to 2, so

if you want to increase the number of MapReduce application master attempts, you will

have to increase the YARN setting on the cluster, too.

The way recovery works is as follows. An application master sends periodic heartbeats to

the resource manager, and in the event of application master failure, the resource manager

will detect the failure and start a new instance of the master running in a new container

(managed by a node manager). In the case of the MapReduce application master, it will

use the job history to recover the state of the tasks that were already run by the (failed) ap-

Search WWH ::

Custom Search

Home