Database Reference
In-Depth Information
Cluster Setup and Installation
This section describes how to install and configure a basic Hadoop cluster from scratch us-
ing the Apache Hadoop distribution on a Unix operating system. It provides background in-
formation on the things you need to think about when setting up Hadoop. For a production
installation, most users and operators should consider one of the Hadoop cluster manage-
ment tools listed at the beginning of this chapter.
Installing Java
Hadoop runs on both Unix and Windows operating systems, and requires Java to be in-
stalled. For a production installation, you should select a combination of operating system,
Java, and Hadoop that has been certified by the vendor of the Hadoop distribution you are
using. There is also a page on the Hadoop wiki that lists combinations that community
members have run with success.
Creating Unix User Accounts
It's good practice to create dedicated Unix user accounts to separate the Hadoop processes
from each other, and from other services running on the same machine. The HDFS,
MapReduce, and YARN services are usually run as separate users, named hdfs , mapred ,
and yarn , respectively. They all belong to the same hadoop group.
Installing Hadoop
Download Hadoop from the Apache Hadoop releases page , and unpack the contents of the
distribution in a sensible location, such as /usr/local ( /opt is another standard choice; note
that Hadoop should not be installed in a user's home directory, as that may be an NFS-
mounted directory):
% cd /usr/local
% sudo tar xzf hadoop- x.y.z .tar.gz
You also need to change the owner of the Hadoop files to be the hadoop user and group:
% sudo chown -R hadoop:hadoop hadoop- x.y.z
It's convenient to put the Hadoop binaries on the shell path too:
% export HADOOP_HOME=/usr/local/hadoop- x.y.z
% export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Search WWH ::




Custom Search