Setting Up a Hadoop Cluster - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Cluster Setup and Installation

This section describes how to install and configure a basic Hadoop cluster from scratch us-

ing the Apache Hadoop distribution on a Unix operating system. It provides background in-

formation on the things you need to think about when setting up Hadoop. For a production

installation, most users and operators should consider one of the Hadoop cluster manage-

ment tools listed at the beginning of this chapter.

Installing Java

Hadoop runs on both Unix and Windows operating systems, and requires Java to be in-

stalled. For a production installation, you should select a combination of operating system,

Java, and Hadoop that has been certified by the vendor of the Hadoop distribution you are

using. There is also a page on the Hadoop wiki that lists combinations that community

members have run with success.

Creating Unix User Accounts

It's good practice to create dedicated Unix user accounts to separate the Hadoop processes

from each other, and from other services running on the same machine. The HDFS,

MapReduce, and YARN services are usually run as separate users, named hdfs , mapred ,

and yarn , respectively. They all belong to the same hadoop group.

Installing Hadoop

Download Hadoop from the Apache Hadoop releases page , and unpack the contents of the

distribution in a sensible location, such as /usr/local ( /opt is another standard choice; note

that Hadoop should not be installed in a user's home directory, as that may be an NFS-

mounted directory):

% cd /usr/local

% sudo tar xzf hadoop- x.y.z .tar.gz

You also need to change the owner of the Hadoop files to be the hadoop user and group:

% sudo chown -R hadoop:hadoop hadoop- x.y.z

It's convenient to put the Hadoop binaries on the shell path too:

% export HADOOP_HOME=/usr/local/hadoop- x.y.z

% export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Search WWH ::

Custom Search

Home