Database Reference
In-Depth Information
While I use a Hadoop stack in the rest of the topic, here I will show the process of downloading, installing,
configuring, and running Hadoop V1 so that you will be able to compare the use of V1 and V2.
Environment Management
Before I move into the Hadoop V1 and V2 installations, I want to point out that I am installing both Hadoop V1 and V2
on the same set of servers. Hadoop V1 is installed under /usr/local while Hadoop V2 is installed as a Cloudera CDH
release and so will have a defined set of directories:
Logging under /var/log; that is, /var/log/hadoop-hdfs/
Configuration under /etc/hadoop/conf/
Executables defined as servers under /etc/init.d/; that is, hadoop-hdfs-namenode
I have also created two sets of .bashrc environment configuration files for the Linux Hadoop user account:
[hadoop@hc1nn ~]$ pwd
/home/hadoop
[hadoop@hc1nn ~]$ ls -l .bashrc*
lrwxrwxrwx. 1 hadoop hadoop 16 Jun 30 17:59 .bashrc -> .bashrc_hadoopv2
-rw-r--r--. 1 hadoop hadoop 1586 Jun 18 17:08 .bashrc_hadoopv1
-rw-r--r--. 1 hadoop hadoop 1588 Jul 27 11:33 .bashrc_hadoopv2
By switching the .bashrc symbolic link between the Hadoop V1 (.bashrc_hadoopv1) and V2 (.bashrc_hadoopv2)
files, I can quickly navigate between the two environments. Each installation has a completely separate set of
resources. This approach enables me to switch between Hadoop versions on my single set of testing servers while
writing this guide. From a production viewpoint, however, you would install only one version of Hadoop at a time.
Hadoop V1 Installation
Before you attempt to install Hadoop, you must ensure that Java 1.6.x is installed and that SSH (secure shell) is
installed and running. The master name node must be able to create an SSH session to reach each of its data nodes
without using a password in order to manage them. On CentOS, you can install SSH via the root account as follows:
yum install openssh-server
This will install the secure shell daemon process. Repeat this installation on all of your servers, then start the
service (as root):
service sshd restart
Now, in order to make the SSH sessions from the name node to the data nodes operate without a password,
you must create an SSH key on the name node and copy the key to each of the data nodes. You create the key with
the keygen command as the hadoop user (I created the hadoop user account during the installation of the CentOS
operating system on each server), as follows:
ssh-keygen
 
Search WWH ::




Custom Search