Database Reference
In-Depth Information
Resource-based Scheduling : Capacity uses an algorithm that supports memory-based resource scheduling for
jobs that are resource intensive.
Hierarchical Queues : When used with Hadoop V2, Capacity supports a hierarchy of queues, so that under-
utilized resources are first shared among subqueues before they are then allocated to other cluster tenant queues.
Job Priorities : In Hadoop V1, the scheduler supports scheduling by job priority.
Operability : Capacity enables you to change the configuration of a queue at runtime via a console that permits
viewing of the queues. In Hadoop V2, you can also stop a queue to let it drain.
The Fair Scheduler
Fair aims to do what its name implies: share resources fairly among all jobs within a cluster that is owned and used by
a single organization. Over time, it aims to share resources evenly to job pools. Some key aspects of Fair are:
Organization : This scheduler organizes jobs into pools, with resources shared among the pools. Attributes, like
priorities, act as weights when the resources are shared.
Resource Sharing : You can specify a minimum level of resources to a pool. If a pool is empty, then Fair shares the
resources of other pools.
Resource Limits : With Fair, you can specify concurrent job limits by user and pool so as to limit the load
on the cluster.
Scheduling in Hadoop V1
Now that you have a sense of each scheduler's strengths, you're ready to see them put to work. This section
demonstrates job scheduling in a Hadoop V1 environment. You'll learn how to configure the Capacity and Fair
schedulers, and you'll see that the libraries necessary to use them are already supplied with Hadoop V1.2.1, just
waiting for you to plug them in.
V1 Capacity Scheduler
As mentioned, the library used by the Hadoop Capacity scheduler is included in the V1.2.1 release within the lib
directory of the installation, as you can see:
[hadoop@hc1nn lib]$ pwd
/usr/local/hadoop/lib
[hadoop@hc1nn lib]$ ls -l hadoop-capacity-scheduler*
-rw-rw-r--. 1 hadoop hadoop 58461 Jul 23 2013 hadoop-capacity-scheduler-1.2.1.jar
To use the library, you plug it into the configuration by adding the following property to the mapred-site.xml file
in the conf directory of the installation:
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
<description>Plugin the Capcity scheduler</description>
</property>
 
Search WWH ::




Custom Search