Databases Reference
In-Depth Information
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value> HADOOP_CONF_DIR /pools.xml</value>
</property>
<property>
<name>mapred.fairscheduler.assignmultiple</name>
<value>true</value>
</property>
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>
<property>
<name>pool.name</name>
<value>${user.name}</value>
</property>
The allocation file pools.xml defines the pools for the scheduler. It gives each pool
a name and capacity constraints. The constraints can include the minimum number
of map slots or reduce slots. They can also include the maximum number of running
jobs. In addition, you can set the maximum number of running jobs per user, and also
override this maximum for specific users. An example pools.xml looks like this:
<?xml version="1.0"?>
<allocations>
<pool name="ads">
<minMaps>2</minMaps>
<minReduces>2</minReduces>
</pool>
<pool name="hive">
<minMaps>2</minMaps>
<minReduces>2</minReduces>
<maxRunningJobs>2</maxRunningJobs>
</pool>
<user name="chuck">
<maxRunningJobs>6</maxRunningJobs>
</user>
<userMaxJobsDefault>3</userMaxJobsDefault>
</allocations>
This pools.xml defines two special pools, “ads” and “hive”. Each is guaranteed to have
at least two map slots and two reduce slots. The “hive” pool is limited to running at
most two jobs at once. To use these pools, you set the pool.name property in a job's
configuration to either “ads” or “hive”. This pools.xml also caps the number of simul-
taneous running jobs a user can have to three, but the user “chuck” is given a higher
cap of six.
Note that the pools.xml file is reread every 15 seconds. You can modify this file and
dynamically reallocate capacity at run time. Any pool not defined in this file has no
guaranteed capacity and no limit on number of jobs running at once.
 
Search WWH ::




Custom Search