Scheduling and Workflow - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Resource-based Scheduling : Capacity uses an algorithm that supports memory-based resource scheduling for

jobs that are resource intensive.

Hierarchical Queues : When used with Hadoop V2, Capacity supports a hierarchy of queues, so that under-

utilized resources are first shared among subqueues before they are then allocated to other cluster tenant queues.

Job Priorities : In Hadoop V1, the scheduler supports scheduling by job priority.

Operability : Capacity enables you to change the configuration of a queue at runtime via a console that permits

viewing of the queues. In Hadoop V2, you can also stop a queue to let it drain.

The Fair Scheduler

Fair aims to do what its name implies: share resources fairly among all jobs within a cluster that is owned and used by

a single organization. Over time, it aims to share resources evenly to job pools. Some key aspects of Fair are:

Organization : This scheduler organizes jobs into pools, with resources shared among the pools. Attributes, like

priorities, act as weights when the resources are shared.

Resource Sharing : You can specify a minimum level of resources to a pool. If a pool is empty, then Fair shares the

resources of other pools.

Resource Limits : With Fair, you can specify concurrent job limits by user and pool so as to limit the load

on the cluster.

Scheduling in Hadoop V1

Now that you have a sense of each scheduler's strengths, you're ready to see them put to work. This section

demonstrates job scheduling in a Hadoop V1 environment. You'll learn how to configure the Capacity and Fair

schedulers, and you'll see that the libraries necessary to use them are already supplied with Hadoop V1.2.1, just

waiting for you to plug them in.

V1 Capacity Scheduler

As mentioned, the library used by the Hadoop Capacity scheduler is included in the V1.2.1 release within the lib

directory of the installation, as you can see:

[hadoop@hc1nn lib]$ pwd

/usr/local/hadoop/lib

[hadoop@hc1nn lib]$ ls -l hadoop-capacity-scheduler*

-rw-rw-r--. 1 hadoop hadoop 58461 Jul 23 2013 hadoop-capacity-scheduler-1.2.1.jar

To use the library, you plug it into the configuration by adding the following property to the mapred-site.xml file

in the conf directory of the installation:

<name>mapred.jobtracker.taskScheduler</name>

<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>

<description>Plugin the Capcity scheduler</description>

</property>

Search WWH ::

Custom Search

Home