Troubleshooting Job Failures - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

• JobHistoryServer This is a service that serves historical information about completed jobs.

JobHistoryServer can be embedded within the JobTracker process. If you have an extremely busy

cluster, it is recommended that you run this as a separate service. This can be done by setting

the mapreduce.history.server.embedded property to true in the mapred-site.xml file .

Running this service consumes considerable disk space because it saves job history information

for all the jobs.

■ in hadoop versions 2.0 and beyond, Mapreduce will be replaced by YARN or Mapreduce 2.0 (also known

as Mrv2). Yarn is a subproject of hadoop at the apache software Foundation that was introduced in hadoop 2.0.

it separates the resource-management and processing components. it provides a more generalized processing platform

that is not restricted to just Mapreduce.

Note

Configuration Files

There are two key configuration files that have the various parameters for MapReduce jobs. These files are located in

the path C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\conf\ of the NameNode:

core-site.xml

•

mapred-site.xml

core-site.xml

This file contains configuration settings for Hadoop Core, such as I/O settings that are common to Windows Azure

Storage Blob (WASB) and MapReduce. It is used by all Hadoop services and clients because all services need to

know how to locate the NameNode. There will be a copy of this file in each node running a Hadoop service. This file

has several key elements of interest—particularly because the storage infrastructure has moved to WASB instead

of being in Hadoop Distributed File System (HDFS), which used to be local to the data nodes. For example, in your

democluster , you should see entries in your core-site.xml file similar to Listing 13-1.

Listing 13-1. WASB detail

<name>fs.default.name</name>

<value>wasb://democlustercontainer@democluster.blob.core.windows.net

</value>

<description>The name of the default file system. Either the

literal string "local" or a host:port for NDFS.

</description>

</property>

If there is an issue with accessing your storage that is causing your jobs to fail, the core-site.xml file is the first

place where you should confirm that your cluster is pointing toward the correct storage account and container.

The core-site.xml file also has an attribute for the storage key, as shown in Listing 13-2. If you are encountering

502/403 - Forbidden/Authentication errors while accessing your storage, you must make sure that the proper storage

account key is provided.

Search WWH ::

Custom Search

Home