Database Reference
In-Depth Information
Chapter 11
Logging in HDInsight
A complex eco system like Hadoop must have a detailed logging mechanism to fall back on in case something goes
wrong. In traditional Hadoop, all the services—like NameNode, JobTracker, TaskTracker, and so on—have logging
capabilities where each and every operation is logged right from service startup to shut down. Apart from the
services or daemons startup, there are additional events that need to be recorded, such as job requests, interprocess
communication between the services, job execution history, and others.
HDInsight distribution extends this logging mechanism by implementing its own. As you know, the entire cluster
storage for the HDInsight service is in Azure in the form of blob containers. So you need to know and rely on the
Azure storage logs to track down any access or space limitation issues. This chapter specifically focuses on the logging
and instrumentation available for the Windows Azure-based Hadoop services and also gives you a glimpse into the
traditional Hadoop logging mechanism.
Hadoop uses the Apache Log4j framework for logging, which is basically a logging package for Java. This logging
framework not only logs operational information, it also gives you the control to tune different levels of logging as
required—for example, errors or warnings—and several instrumentation options like log recycling, maintaining log
history, and so on. This chapter will talk about a few key Log4j properties, but for a detailed understanding on the
Log4j framework, you can visit the Apache site:
http://logging.apache.org/log4j/2.x/manual/index.html
Service Logs
Hadoop daemons are replaced by Windows Services in the HDInsight distribution. Different services run on different
nodes of the cluster based on the role they play. You need to make a remote desktop connection to the nodes to access
their respective log files.
Service Trace Logs
The service startup logs are located in the C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\bin directory for the Hadoop
services. Similarly, other service-based projects in the ecosystem (like Hive, Oozie and so on) log their service startup
operations in their respective bin folders. These files are marked with .trace.log extensions, and they are created
and written to during the startup of the services. Table 11-1 summarizes the different types of trace.log files available
for the projects shipped in the current distribution of HDInsight on Azure.
 
Search WWH ::




Custom Search