Storing and Configuring Data with Hadoop, YARN, and ZooKeeper - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

While I use a Hadoop stack in the rest of the topic, here I will show the process of downloading, installing,

configuring, and running Hadoop V1 so that you will be able to compare the use of V1 and V2.

Environment Management

Before I move into the Hadoop V1 and V2 installations, I want to point out that I am installing both Hadoop V1 and V2

on the same set of servers. Hadoop V1 is installed under /usr/local while Hadoop V2 is installed as a Cloudera CDH

release and so will have a defined set of directories:

•

Logging under /var/log; that is, /var/log/hadoop-hdfs/

•

Configuration under /etc/hadoop/conf/

•

Executables defined as servers under /etc/init.d/; that is, hadoop-hdfs-namenode

I have also created two sets of .bashrc environment configuration files for the Linux Hadoop user account:

[hadoop@hc1nn ~]$ pwd

/home/hadoop

[hadoop@hc1nn ~]$ ls -l .bashrc*

lrwxrwxrwx. 1 hadoop hadoop 16 Jun 30 17:59 .bashrc -> .bashrc_hadoopv2

-rw-r--r--. 1 hadoop hadoop 1586 Jun 18 17:08 .bashrc_hadoopv1

-rw-r--r--. 1 hadoop hadoop 1588 Jul 27 11:33 .bashrc_hadoopv2

By switching the .bashrc symbolic link between the Hadoop V1 (.bashrc_hadoopv1) and V2 (.bashrc_hadoopv2)

files, I can quickly navigate between the two environments. Each installation has a completely separate set of

resources. This approach enables me to switch between Hadoop versions on my single set of testing servers while

writing this guide. From a production viewpoint, however, you would install only one version of Hadoop at a time.

Hadoop V1 Installation

Before you attempt to install Hadoop, you must ensure that Java 1.6.x is installed and that SSH (secure shell) is

installed and running. The master name node must be able to create an SSH session to reach each of its data nodes

without using a password in order to manage them. On CentOS, you can install SSH via the root account as follows:

yum install openssh-server

This will install the secure shell daemon process. Repeat this installation on all of your servers, then start the

service (as root):

service sshd restart

Now, in order to make the SSH sessions from the name node to the data nodes operate without a password,

you must create an SSH key on the name node and copy the key to each of the data nodes. You create the key with

the keygen command as the hadoop user (I created the hadoop user account during the installation of the CentOS

operating system on each server), as follows:

ssh-keygen

Search WWH ::

Custom Search

Home