Scheduling and Workflow - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

<name>yarn.scheduler.capacity.root.client1.user-limit-factor</name> <value>1</value>

</property>

<name>yarn.scheduler.capacity.root.client1.maximum-capacity</name> <value>100</value>

</property>

<name>yarn.scheduler.capacity.root.client1.state</name> <value>RUNNING</value>

</property>

<name>yarn.scheduler.capacity.root.client1.acl_submit_applications</name> <value>*</value>

</property>

<name>yarn.scheduler.capacity.root.client1.acl_administer_queue</name> <value>*</value>

</property>

Now, I get YARN to refresh its queue configuration by using the yarn rmadmin command with a -refreshQueues

option. This causes YARN to reread its configuration files and so pick up the changes that have been made:

[hadoop@hc1nn conf]$ yarn rmadmin -refreshQueues

My reconfigured scheduler is ready, and I now submit a word-count job to show the queues in use. To display

the queue's functionality, however, I need some test data; therefore, I have created the job's input data in the HDFS

directory /usr/hadoop/edgar, as the HDFS file system command shows (and I populate it with data):

[hadoop@hc1nn edgar]$ hdfs dfs -ls /usr/hadoop/edgar

Found 5 items

-rw-r--r-- 2 hadoop hadoop 410012 2014-07-01 18:14 /usr/hadoop/edgar/10031.txt

-rw-r--r-- 2 hadoop hadoop 559352 2014-07-01 18:14 /usr/hadoop/edgar/15143.txt

-rw-r--r-- 2 hadoop hadoop 66401 2014-07-01 18:14 /usr/hadoop/edgar/17192.txt

-rw-r--r-- 2 hadoop hadoop 596736 2014-07-01 18:14 /usr/hadoop/edgar/2149.txt

-rw-r--r-- 2 hadoop hadoop 63278 2014-07-01 18:14 /usr/hadoop/edgar/932.txt

The word-count job will read this data, run a word count, and place the output results in the HDFS directory /

usr/hadoop/edgar-results1. The word-count job results look like this:

hdfs \

jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \

wordcount \

-Dmapred.job.queue.name=client1a \

/usr/hadoop/edgar \

/usr/hadoop/edgar-results1

The backslash characters (\) allow me to spread the command over multiple lines to make it more readable. I've

used a -D option to specify the queue in which to place this job (client1a).

Now I examine the scheduler configuration using the Name Node server name hc1nn, with a port value of 8088,

taken from the property yarn.resourcemanager.webapp.address in the configuration file yarn-site.xml:

http://hc1nn:8088/cluster/scheduler . Figure 5-3 illustrates the resulting hierarchy of job queues, with the

currently running word-count Map Reduce job placed in the client1a. child queue.

Search WWH ::

Custom Search

Home