Database Reference
In-Depth Information
The file is commented to make it easier for you to set the logging levels. As you can see in the preceding code
example, you can set the log levels to WARN to stop logging generic INFO messages. You can opt to log messages only
in the case of debugging for several services, like JobTracker and TaskTracker. To further shrink the logs, you can also
set the logging level to ERR to ignore all warnings and worry only in the case of errors. There are other properties of
interest as well, especially those that control the log rollover, retention period, maximum file size, and so on, as shown
in the following snippet:
# Roll over at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
#Default values
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
Simple settings like these can really help you control log file growth and avoid certain problems in the future. You
have limited control over what your users decide to log in their MapReduce code, but what you do have control over is
the task attempt and execution log levels.
Each of the data nodes have a userlogs folder inside the C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\logs\
directory. This folder contains a historical record of all the MapReduce jobs or tasks executed in the cluster. To create
a complete chain of logs however, you need to visit the userlogs folder of every data node in the cluster, and aggregate
the logs based on timestamp. This is because the name node dynamically picks which data nodes to execute a specific
task during a job's execution. Figure 11-5 shows the userlogs directory of one of the data nodes after a few job
executions in the cluster.
Search WWH ::




Custom Search