Troubleshooting Job Failures - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

2013-11-16 17:28:53,736 INFO org.apache.hadoop.mapred.JobInProgress$JobSummary:

jobId=job_201311120315_0003,submitTime=1384622907254,

launchTime=1384622909953,firstMapTaskLaunchTime=1384622917870,

firstReduceTaskLaunchTime=1384622922122,firstJobSetupTaskLaunchTime=1384622909966,

firstJobCleanupTaskLaunchTime=1384622931484,finishTime=1384622933735,numMaps=1,

numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=amarpb,queue=default,

status=SUCCEEDED,mapSlotSeconds=8,reduceSlotsSeconds=9,clusterMapCapacity=16,

clusterReduceCapacity=8, jobName=select count(*) from hivesampletable(Stage-1)

2013-11-16 17:28:53,790 INFO org.apache.hadoop.mapred.JobQueuesManager:

Job job_201311120315_0003 submitted to queue default has completed

2013-11-16 17:28:53,791 INFO org.apache.hadoop.mapred.JobTracker:

Removing task 'attempt_201311120315_0003_m_000000_0'

2013-11-16 17:28:53,791 INFO org.apache.hadoop.mapred.JobTracker:

Removing task 'attempt_201311120315_0003_m_000001_0'

2013-11-16 17:28:53,791 INFO org.apache.hadoop.mapred.JobTracker:

Removing task 'attempt_201311120315_0003_m_000002_0'

2013-11-16 17:28:53,792 INFO org.apache.hadoop.mapred.JobTracker:

Removing task 'attempt_201311120315_0003_r_000000_0'

2013-11-16 17:28:53,815 INFO org.apache.hadoop.mapred.JobHistory:

Creating DONE subfolder at wasb://democlustercontainer@democluster.blob.core.windows.net/mapred/

history/done/version-1/jobtrackerhost_1384226104721_/2013/11/16/000000

2013-11-16 17:28:53,978 INFO org.apache.hadoop.mapred.JobHistory:

Moving file:/c:/apps/dist/hadoop-1.2.0.1.3.0.1-

0302/logs/history/job_201311120315_0003_1384622907254_desarkar_

select+count%28%20F%29+from+hivesampletable%28Stage-1%29_default_%20F

to wasb://testhdi@democluster.blob.core.windows.net/mapred/history/done/

version-1/jobtrackerhost_1384226104721_/2013/11/16/000000

2013-11-16 17:28:54,322 INFO org.apache.hadoop.mapred.JobHistory:

Moving file:/c:/apps/dist/hadoop-1.2.0.1.3.0.1-0302/logs/history/

job_201311120315_0003_conf.xml to wasb://democlustercontainer@democluster.blob.core.windows.net/mapred/

history/done/version-1/jobtrackerhost_1384226104721_/2013/11/16/000000

The JobTracker log files are pretty verbose. If you go through them carefully, you should be able to track down

and resolve any errors in your Hive data-processing jobs.

Troubleshooting can be tricky however, if the problem is with job performance. If your Hive queries are joining

multiple tables and their different partitions, the query response times can be quite long. In some cases, they will need

manual tuning for optimum throughput. To that end, the following subsections provide some best practices leading

toward better execution performance.

Compress Intermediate Files

A large volume of intermediate files are generated during the execution of MapReduce jobs. Analysis has shown that if

these intermediate files are compressed, job execution performance tends to be better. You can execute the following

SET commands to set compression parameters from the Hadoop command line:

set mapred.compress.map.output=true;

set

mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

set hive.exec.compress.intermediate=true

Pro Microsoft HDInsight: Hadoop on Windows

Search WWH ::

Custom Search

Home