Managing Hadoop - Hadoop in Action

Databases Reference

In-Depth Information

<name>mapred.jobtracker.taskScheduler</name>

<value>org.apache.hadoop.mapred.FairScheduler</value>

</property>

<name>mapred.fairscheduler.allocation.file</name>

<value> HADOOP_CONF_DIR /pools.xml</value>

</property>

<name>mapred.fairscheduler.assignmultiple</name>

</property>

<name>mapred.fairscheduler.poolnameproperty</name>

</property>

</property>

The allocation file pools.xml defines the pools for the scheduler. It gives each pool

a name and capacity constraints. The constraints can include the minimum number

of map slots or reduce slots. They can also include the maximum number of running

jobs. In addition, you can set the maximum number of running jobs per user, and also

override this maximum for specific users. An example pools.xml looks like this:

<?xml version="1.0"?>

</pool>

</pool>

</user>

</allocations>

This pools.xml defines two special pools, “ads” and “hive”. Each is guaranteed to have

at least two map slots and two reduce slots. The “hive” pool is limited to running at

most two jobs at once. To use these pools, you set the pool.name property in a job's

configuration to either “ads” or “hive”. This pools.xml also caps the number of simul-

taneous running jobs a user can have to three, but the user “chuck” is given a higher

cap of six.

Note that the pools.xml file is reread every 15 seconds. You can modify this file and

dynamically reallocate capacity at run time. Any pool not defined in this file has no

guaranteed capacity and no limit on number of jobs running at once.

Search WWH ::

Custom Search

Home