Developing a MapReduce Application - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

% hadoop jar hadoop-examples.jar LoggingDriver -conf conf/

hadoop-cluster.xml \

-D mapreduce.map.log.level=DEBUG input/ncdc/sample.txt logging-out

There are some controls for managing the retention and size of task logs. By default, logs

are deleted after a minimum of three hours (you can set this using the

yarn.nodemanager.log.retain-seconds property, although this is ignored if

log aggregation is enabled). You can also set a cap on the maximum size of each logfile

using the mapreduce.task.userlog.limit.kb property, which is 0 by default,

meaning there is no cap.

TIP

Sometimes you may need to debug a problem that you suspect is occurring in the JVM running a Ha-

doop command, rather than on the cluster. You can send DEBUG- level logs to the console by using an

invocation like this:

% HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -text /foo/bar

Remote Debugging

When a task fails and there is not enough information logged to diagnose the error, you

may want to resort to running a debugger for that task. This is hard to arrange when run-

ning the job on a cluster, as you don't know which node is going to process which part of

the input, so you can't set up your debugger ahead of the failure. However, there are a few

other options available:

Reproduce the failure locally

Often the failing task fails consistently on a particular input. You can try to reproduce

the problem locally by downloading the file that the task is failing on and running the

job locally, possibly using a debugger such as Java's VisualVM.

Use JVM debugging options

A common cause of failure is a Java out of memory error in the task JVM. You can set

mapred.child.java.opts to include -XX:-

HeapDumpOnOutOfMemoryError -XX:HeapDumpPath= /path/to/

dumps . This setting produces a heap dump that can be examined afterward with tools

such as jhat or the Eclipse Memory Analyzer. Note that the JVM options should be ad-

ded to the existing memory settings specified by mapred.child.java.opts .

These are explained in more detail in Memory settings in YARN and MapReduce .

Search WWH ::

Custom Search

Home