Developing a MapReduce Application - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Use task profiling

Java profilers give a lot of insight into the JVM, and Hadoop provides a mechanism to

profile a subset of the tasks in a job. See Profiling Tasks .

In some cases, it's useful to keep the intermediate files for a failed task attempt for later

inspection, particularly if supplementary dump or profile files are created in the task's

working directory. You can set mapre-

duce.task.files.preserve.failedtasks to true to keep a failed task's

files.

You can keep the intermediate files for successful tasks, too, which may be handy if you

want to examine a task that isn't failing. In this case, set the property mapre-

duce.task.files.preserve.filepattern to a regular expression that matches

the IDs of the tasks whose files you want to keep.

Another useful property for debugging is yarn.nodemanager.delete.debug-

delay-sec , which is the number of seconds to wait to delete localized task attempt

files, such as the script used to launch the task container JVM. If this is set on the cluster

to a reasonably large value (e.g., 600 for 10 minutes), then you have enough time to look

at the files before they are deleted.

To examine task attempt files, log into the node that the task failed on and look for the dir-

ectory for that task attempt. It will be under one of the local MapReduce directories, as set

by the mapreduce.cluster.local.dir property (covered in more detail in Im-

portant Hadoop Daemon Properties ) . If this property is a comma-separated list of director-

ies (to spread load across the physical disks on a machine), you may need to look in all of

the directories before you find the directory for that particular task attempt. The task at-

tempt directory is in the following location:

mapreduce.cluster.local.dir /usercache/ user /appcache/ application-ID /output

/ task-attempt-ID

Search WWH ::

Custom Search

Home