Database Reference
In-Depth Information
ZooKeeper ZNodes and their commands are building blocks for meeting your needs. Ephemeral nodes are
especially useful in a distributed clustered environment, for example. In each session, you could create a node to
see which application nodes were connected. You could also store all your configuration information in a series of
ZooKeeper ZNodes and have the applications use that configuration information on each node. You would then be
able to ensure that the nodes were using the same configuration information.
This has been a basic introduction to ZooKeeper. For further reading, have a look at Cloudera's site or perhaps have a
go at building your own distributed application.
Hadoop MRv2 and YARN
With ZooKeeper in place, you can continue installing the Cloudera CDH 4 release. The components will be installed
using yum commands as root to install Cloudera packages. I chose to install a Cloudera stack because the installation
has been professionally tested and packaged. The components are guaranteed to work together and with a range
of Hadoop client applications. The instructions that follow describe the installation of the Name Node, Data Node,
Resource Manager, Node Manager, Job History, and Proxy servers.
In comparison to the V1 installation, you do not have to choose the location for your installation; that is done
automatically and the different parts of the installation are placed in meaningful locations. Configuration is placed
under /etc/hadoop, logs are placed under /var/log, and executables are created as Linux servers under /etc/init.d.
Here's the process:
1.
Install the HDFS Name Node component on the master server hc1nn:
[root@hc1nn ~]# yum install hadoop-hdfs-namenode
2.
Install the HDFS Data Node component on the slave servers hc1r1m1 through 3:
[root@hc1r1m1 ~]# yum install hadoop-hdfs-datanode
3.
Install the Resource Manager component on the Name Node machine hc1nn:
[root@hc1nn ~]# yum install hadoop-yarn-resourcemanager
4.
Install the Node Manager and Map Reduce on all of the Data Node slave servers hc1r1m1
through 3:
[root@hc1r1m1 ~]# yum install hadoop-yarn-nodemanager hadoop-mapreduce
5.
Install the Job History and Proxy servers on a single node:
yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
That concludes the component package installations.
Now that the software is installed, you need to set up the configuration files that they depend upon. You can find
these configuration files under the directory /etc/hadoop/conf. They all have names like <component>-site.xml,
where <component> is replaced by yarn, hdfs, mapred , or core .
The HDFS term you have come across already; it is the Hadoop distributed file system. YARN stands for “yet
another resource negotiator.” The MAPRED component is short for “Map Reduce,” and CORE is the configuration for
the Hadoop common utilities that support other Hadoop functions.
 
Search WWH ::




Custom Search