Databases Reference
In-Depth Information
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>
HADOOP_CONF_DIR
/pools.xml</value>
</property>
<property>
<name>mapred.fairscheduler.assignmultiple</name>
<value>true</value>
</property>
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>
<property>
<name>pool.name</name>
<value>${user.name}</value>
</property>
The allocation file
pools.xml
defines the pools for the scheduler. It gives each pool
a name and capacity constraints. The constraints can include the minimum number
of map slots or reduce slots. They can also include the maximum number of running
jobs. In addition, you can set the maximum number of running jobs per user, and also
override this maximum for specific users. An example
pools.xml
looks like this:
<?xml version="1.0"?>
<allocations>
<pool name="ads">
<minMaps>2</minMaps>
<minReduces>2</minReduces>
</pool>
<pool name="hive">
<minMaps>2</minMaps>
<minReduces>2</minReduces>
<maxRunningJobs>2</maxRunningJobs>
</pool>
<user name="chuck">
<maxRunningJobs>6</maxRunningJobs>
</user>
<userMaxJobsDefault>3</userMaxJobsDefault>
</allocations>
This
pools.xml
defines two special pools, “ads” and “hive”. Each is guaranteed to have
at least two map slots and two reduce slots. The “hive” pool is limited to running at
most two jobs at once. To use these pools, you set the
pool.name
property in a job's
configuration to either “ads” or “hive”. This
pools.xml
also caps the number of simul-
taneous running jobs a user can have to three, but the user “chuck” is given a higher
cap of six.
Note that the
pools.xml
file is reread every 15 seconds. You can modify this file and
dynamically reallocate capacity at run time. Any pool not defined in this file has no
guaranteed capacity and no limit on number of jobs running at once.