Scheduling and Workflow - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

</pool>

Here, I define three pools ( high_pool , low_pool , and default ), each with the same configuration for the Map

Reduce minimum and maximum limits. They all have a maximum of 1,000 running jobs, but the high pool has three

times the weighting, so the high_pool will get three times the share of the cluster than the other pools will get.

To show how the Fair scheduler works, I will run a Pig-based job as an example. But before running the job

example, I restart the Map Reduce servers as I did for the Capacity scheduler earlier, so as to pick up the changes to

the configuration. Using the mapred.fairscheduler.pool property, I specify the name of the queue that my Pig Latin

job will be placed on when I issue the command line, using the -D switch, as follows.

[hadoop@hc1nn pig]$ pig -Dmapred.fairscheduler.pool=high_pool wordcount2.pig

Note: If I did not specify the pool to be used by using the -D option, I would encounter an error because, by

default, Fair assumes that the queue name matches the Linux account name. Given that I am running the job using

the Linux hadoop account, Fair would have looked for a queue named “hadoop,” which does not exist. The error

message I would receive is an example of an UndeclaredPoolException error:

Failed Jobs:

JobId Alias Feature Message Outputs

N/A clines,gword,rlines,wcount,words GROUP_BY,COMBINER

Message: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.UndeclaredPoolException:

Pool name: 'hadoop' is invalid. Add pool name to the fair scheduler allocation file. Valid pools

are: high_pool, low_pool

However, proceeding with the Fair scheduler example, I can see that the scheduler has started without error from

the Job Server's log in the install logs directory. I look for the following line:

2014-06-29 13:41:27,882 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured

FairScheduler

By checking the Job Tracker user interface, I can learn more about the job details. Figure 5-2 shows a compound

image of the Pig Latin job that I submitted to the high_pool pool. For instance, the top table is taken from the list

of running jobs from the Job Tracker user interface. It shows that the hadoop user has submitted the Pig Latin job

wordcount2.pig, which is running.

Search WWH ::

Custom Search

Home