Storing and Configuring Data with Hadoop, YARN, and ZooKeeper - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

ZooKeeper ZNodes and their commands are building blocks for meeting your needs. Ephemeral nodes are

especially useful in a distributed clustered environment, for example. In each session, you could create a node to

see which application nodes were connected. You could also store all your configuration information in a series of

ZooKeeper ZNodes and have the applications use that configuration information on each node. You would then be

able to ensure that the nodes were using the same configuration information.

This has been a basic introduction to ZooKeeper. For further reading, have a look at Cloudera's site or perhaps have a

go at building your own distributed application.

Hadoop MRv2 and YARN

With ZooKeeper in place, you can continue installing the Cloudera CDH 4 release. The components will be installed

using yum commands as root to install Cloudera packages. I chose to install a Cloudera stack because the installation

has been professionally tested and packaged. The components are guaranteed to work together and with a range

of Hadoop client applications. The instructions that follow describe the installation of the Name Node, Data Node,

Resource Manager, Node Manager, Job History, and Proxy servers.

In comparison to the V1 installation, you do not have to choose the location for your installation; that is done

automatically and the different parts of the installation are placed in meaningful locations. Configuration is placed

under /etc/hadoop, logs are placed under /var/log, and executables are created as Linux servers under /etc/init.d.

Here's the process:

1.

Install the HDFS Name Node component on the master server hc1nn:

[root@hc1nn ~]# yum install hadoop-hdfs-namenode

2.

Install the HDFS Data Node component on the slave servers hc1r1m1 through 3:

[root@hc1r1m1 ~]# yum install hadoop-hdfs-datanode

3.

Install the Resource Manager component on the Name Node machine hc1nn:

[root@hc1nn ~]# yum install hadoop-yarn-resourcemanager

4.

Install the Node Manager and Map Reduce on all of the Data Node slave servers hc1r1m1

through 3:

[root@hc1r1m1 ~]# yum install hadoop-yarn-nodemanager hadoop-mapreduce

5.

Install the Job History and Proxy servers on a single node:

yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver

That concludes the component package installations.

Now that the software is installed, you need to set up the configuration files that they depend upon. You can find

these configuration files under the directory /etc/hadoop/conf. They all have names like <component>-site.xml,

where <component> is replaced by yarn, hdfs, mapred , or core .

The HDFS term you have come across already; it is the Hadoop distributed file system. YARN stands for “yet

another resource negotiator.” The MAPRED component is short for “Map Reduce,” and CORE is the configuration for

the Hadoop common utilities that support other Hadoop functions.

Search WWH ::

Custom Search

Home