Setting Up a Hadoop Cluster - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

In addition to the memory requirements of the daemons, the node manager allocates con-

tainers to applications, so we need to factor these into the total memory footprint of a

worker machine; see Memory settings in YARN and MapReduce .

System logfiles

System logfiles produced by Hadoop are stored in $HADOOP_HOME/logs by default.

This can be changed using the HADOOP_LOG_DIR setting in hadoop-env.sh . It's a good

idea to change this so that logfiles are kept out of the directory that Hadoop is installed in.

Changing this keeps logfiles in one place, even after the installation directory changes due

to an upgrade. A common choice is /var/log/hadoop , set by including the following line in

hadoop-env.sh :

export HADOOP_LOG_DIR=/var/log/hadoop

The log directory will be created if it doesn't already exist. (If it does not exist, confirm

that the relevant Unix Hadoop user has permission to create it.) Each Hadoop daemon

running on a machine produces two logfiles. The first is the log output written via log4j.

This file, whose name ends in .log , should be the first port of call when diagnosing prob-

lems because most application log messages are written here. The standard Hadoop log4j

configuration uses a daily rolling file appender to rotate logfiles. Old logfiles are never de-

leted, so you should arrange for them to be periodically deleted or archived, so as to not

run out of disk space on the local node.

The second logfile is the combined standard output and standard error log. This logfile,

whose name ends in .out , usually contains little or no output, since Hadoop uses log4j for

logging. It is rotated only when the daemon is restarted, and only the last five logs are re-

tained. Old logfiles are suffixed with a number between 1 and 5, with 5 being the oldest

file.

Logfile names (of both types) are a combination of the name of the user running the dae-

mon, the daemon name, and the machine hostname. For example, hadoop-hdfs-datanode-

ip-10-45-174-112.log.2014-09-20 is the name of a logfile after it has been rotated. This

naming structure makes it possible to archive logs from all machines in the cluster in a

single directory, if needed, since the filenames are unique.

The username in the logfile name is actually the default for the

HADOOP_IDENT_STRING setting in hadoop-env.sh . If you wish to give the Hadoop in-

stance a different identity for the purposes of naming the logfiles, change

HADOOP_IDENT_STRING to be the identifier you want.

Search WWH ::

Custom Search

Home