Database Reference
In-Depth Information
Listing 13-4. mapred-site.xml
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.map.max.attempts</name>
<value>8</value>
</property>
<property>
<name>mapred.reduce.max.attempts</name>
<value>8</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>600000</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>536870912</value>
</property>
If you have active Hadoop clusters, there are numerous scenarios in which you have to come back and check the
properties in Listing 13-4. Most of these properties come into the picture when there are job optimization or tuning
requirements that cause jobs to take an unusually long time to complete. For several other types of obvious errors that
may occur during a job submission, the log files can be a source of a great deal of information.
Log Files
I covered the different types of logs generated by Hadoop and the HDInsight service in detail in Chapter 11. However,
let's go quickly through the logging infrastructure for MapReduce jobs again. The log files are normally stored in
C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\logs\ and C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\bin\ folders by
default. The jobtracker.trace.log file resides in the bin directory, and it logs the job startup command and the
process id. A sample trace would be similar to Listing 13-5.
Listing 13-5. jobtracker.trace.log
HadoopServiceTraceSource Information: 0 : Tracing successfully initialized
DateTime=2013-11-24T06:35:12.0190000Z
Timestamp=3610300511
HadoopServiceTraceSource Information: 0 : Loading service xml:
c:\apps\dist\hadoop-1.2.0.1.3.1.0-06\bin\jobtracker.xml
 
Search WWH ::




Custom Search