Troubleshooting Job Failures - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

Initializing 'default' queue with cap=75.0, maxCap=-1.0, ulMin=100, ulMinFactor=100.0,

supportsPriorities=false, maxJobsToInit=2250, maxJobsToAccept=22500,

maxActiveTasks=200000, maxJobsPerUserToInit=2250,

maxJobsPerUserToAccept=22500, maxActiveTasksPerUser=100000

2013-11-24 07:05:16,099 INFO org.apache.hadoop.mapred.JobTracker:

jobToken generated and stored with users keys in /mapred/system/job_201311240635_0001/jobToken

2013-11-24 07:05:16,796 INFO org.apache.hadoop.mapred.JobInProgress:

job_201311240635_0001: nMaps=1 nReduces=0 max=-1

2013-11-24 07:05:16,799 INFO org.apache.hadoop.mapred.JobQueuesManager:

Job job_201311240635_0001 submitted to queue joblauncher

2013-11-24 07:05:16,800 INFO org.apache.hadoop.mapred.JobTracker:

Job job_201311240635_0001 added successfully for user 'admin' to queue 'joblauncher'

2013-11-24 07:05:16,803 INFO org.apache.hadoop.mapred.AuditLogger: USER=admin

IP=xx.xx.xx.xx OPERATION=SUBMIT_JOB TARGET=job_201311240635_0001 RESULT=SUCCESS

2013-11-24 07:05:19,329 INFO org.apache.hadoop.mapred.JobInitializationPoller:

Passing to Initializer Job Id :job_201311240635_0001 User: admin Queue : joblauncher

2013-11-24 07:05:24,324 INFO org.apache.hadoop.mapred.JobInitializationPoller:

Initializing job : job_201311240635_0001 in Queue joblauncher For user : admin

2013-11-24 07:05:24,324 INFO org.apache.hadoop.mapred.JobTracker:

Initializing job_201311240635_0001

2013-11-24 07:05:24,325 INFO org.apache.hadoop.mapred.JobInProgress:

Initializing job_201311240635_0001

2013-11-24 07:05:24,576 INFO org.apache.hadoop.mapred.JobInProgress:

Input size for job job_201311240635_0001 = 0. Number of splits = 1

2013-11-24 07:05:24,577 INFO org.apache.hadoop.mapred.JobInProgress:

job_201311240635_0001 LOCALITY_WAIT_FACTOR=0.0

2013-11-24 07:05:24,578 INFO org.apache.hadoop.mapred.JobInProgress:

Job job_201311240635_0001 initialized successfully with 1 map tasks and 0 reduce tasks.

2013-11-24 07:05:24,659 INFO org.apache.hadoop.mapred.JobTracker:

Adding task (JOB_SETUP) 'attempt_201311240635_0001_m_000002_0' to tip

task_201311240635_0001_m_000002, for tracker 'tracker_workernode1:127.0.0.1/127.0.0.1:49193'

2013-11-24 07:05:28,224 INFO org.apache.hadoop.mapred.JobInProgress:

Task 'attempt_201311240635_0001_m_000002_0' has completed task_201311240635_0001_m_000002 successfully.

The highlighted sections of the preceding log gives you the key settings configured to execute this job. Because

the jobtracker.trace.log file records the command, you can easily figure out which of the parameters are

overridden in the command line and which are the ones being inherited from the configuration files and then take

appropriate corrective actions.

Compress Job Output

Hadoop is intended for storing large data volumes, so compression becomes a mandatory requirement. You can

choose to compress your MapReduce job output by adding the following two parameters in your mapred-site.xml file:

mapred.output.compress=true

mapred.output.compression.codec= com.hadoop.compression.GzipCodec

Apart from these parameters, MapReduce provides facilities for the application developer to specify compression

for both intermediate map outputs and the job outputs—that is, the output of the reducers. Such compression can be

set up with CompressionCodec class implementation for the zlib compression algorithm in your custom MapReduce

program. For extensive details on Hadoop compression, see the whitepaper

http://msdn.microsoft.com/en-us/dn168917.aspx .

Pro Microsoft HDInsight: Hadoop on Windows

Search WWH ::

Custom Search

Home