Database Reference
In-Depth Information
Use task profiling
Java profilers give a lot of insight into the JVM, and Hadoop provides a mechanism to
profile a subset of the tasks in a job. See Profiling Tasks .
In some cases, it's useful to keep the intermediate files for a failed task attempt for later
inspection, particularly if supplementary dump or profile files are created in the task's
working directory. You can set mapre-
duce.task.files.preserve.failedtasks to true to keep a failed task's
files.
You can keep the intermediate files for successful tasks, too, which may be handy if you
want to examine a task that isn't failing. In this case, set the property mapre-
duce.task.files.preserve.filepattern to a regular expression that matches
the IDs of the tasks whose files you want to keep.
Another useful property for debugging is yarn.nodemanager.delete.debug-
delay-sec , which is the number of seconds to wait to delete localized task attempt
files, such as the script used to launch the task container JVM. If this is set on the cluster
to a reasonably large value (e.g., 600 for 10 minutes), then you have enough time to look
at the files before they are deleted.
To examine task attempt files, log into the node that the task failed on and look for the dir-
ectory for that task attempt. It will be under one of the local MapReduce directories, as set
by the mapreduce.cluster.local.dir property (covered in more detail in Im-
portant Hadoop Daemon Properties ) . If this property is a comma-separated list of director-
ies (to spread load across the physical disks on a machine), you may need to look in all of
the directories before you find the directory for that particular task attempt. The task at-
tempt directory is in the following location:
mapreduce.cluster.local.dir /usercache/ user /appcache/ application-ID /output
/ task-attempt-ID
Search WWH ::




Custom Search