Database Reference
In-Depth Information
<pool name="default">
<minMaps>10</minMaps>
<minReduces>10</minReduces>
<maxMaps>50</maxMaps>
<maxReduces>50</maxReduces>
<maxRunningJobs>1000</maxRunningJobs>
<weight>1</weight>
</pool>
Here, I define three pools ( high_pool , low_pool , and default ), each with the same configuration for the Map
Reduce minimum and maximum limits. They all have a maximum of 1,000 running jobs, but the high pool has three
times the weighting, so the high_pool will get three times the share of the cluster than the other pools will get.
To show how the Fair scheduler works, I will run a Pig-based job as an example. But before running the job
example, I restart the Map Reduce servers as I did for the Capacity scheduler earlier, so as to pick up the changes to
the configuration. Using the mapred.fairscheduler.pool property, I specify the name of the queue that my Pig Latin
job will be placed on when I issue the command line, using the -D switch, as follows.
[hadoop@hc1nn pig]$ pig -Dmapred.fairscheduler.pool=high_pool wordcount2.pig
Note: If I did not specify the pool to be used by using the -D option, I would encounter an error because, by
default, Fair assumes that the queue name matches the Linux account name. Given that I am running the job using
the Linux hadoop account, Fair would have looked for a queue named “hadoop,” which does not exist. The error
message I would receive is an example of an UndeclaredPoolException error:
Failed Jobs:
JobId Alias Feature Message Outputs
N/A clines,gword,rlines,wcount,words GROUP_BY,COMBINER
Message: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.UndeclaredPoolException:
Pool name: 'hadoop' is invalid. Add pool name to the fair scheduler allocation file. Valid pools
are: high_pool, low_pool
However, proceeding with the Fair scheduler example, I can see that the scheduler has started without error from
the Job Server's log in the install logs directory. I look for the following line:
2014-06-29 13:41:27,882 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured
FairScheduler
By checking the Job Tracker user interface, I can learn more about the job details. Figure 5-2 shows a compound
image of the Pig Latin job that I submitted to the high_pool pool. For instance, the top table is taken from the list
of running jobs from the Job Tracker user interface. It shows that the hadoop user has submitted the Pig Latin job
wordcount2.pig, which is running.
 
Search WWH ::




Custom Search