Troubleshooting Job Failures - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

Listing 13-4. mapred-site.xml

<name>mapred.tasktracker.map.tasks.maximum</name>

</property>

<name>mapred.tasktracker.reduce.tasks.maximum</name>

</property>

<name>mapred.map.max.attempts</name>

</property>

<name>mapred.reduce.max.attempts</name>

</property>

<name>mapred.task.timeout</name>

</property>

<name>mapred.max.split.size</name>

</property>

If you have active Hadoop clusters, there are numerous scenarios in which you have to come back and check the

properties in Listing 13-4. Most of these properties come into the picture when there are job optimization or tuning

requirements that cause jobs to take an unusually long time to complete. For several other types of obvious errors that

may occur during a job submission, the log files can be a source of a great deal of information.

Log Files

I covered the different types of logs generated by Hadoop and the HDInsight service in detail in Chapter 11. However,

let's go quickly through the logging infrastructure for MapReduce jobs again. The log files are normally stored in

C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\logs\ and C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\bin\ folders by

default. The jobtracker.trace.log file resides in the bin directory, and it logs the job startup command and the

process id. A sample trace would be similar to Listing 13-5.

Listing 13-5. jobtracker.trace.log

HadoopServiceTraceSource Information: 0 : Tracing successfully initialized

DateTime=2013-11-24T06:35:12.0190000Z

Timestamp=3610300511

HadoopServiceTraceSource Information: 0 : Loading service xml:

c:\apps\dist\hadoop-1.2.0.1.3.1.0-06\bin\jobtracker.xml

Search WWH ::

Custom Search

Home