Setting Up a Hadoop Cluster - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Chapter 10. Setting Up a Hadoop Cluster

This chapter explains how to set up Hadoop to run on a cluster of machines. Running

HDFS, MapReduce, and YARN on a single machine is great for learning about these sys-

tems, but to do useful work, they need to run on multiple nodes.

There are a few options when it comes to getting a Hadoop cluster, from building your

own, to running on rented hardware or using an offering that provides Hadoop as a hosted

service in the cloud. The number of hosted options is too large to list here, but even if you

choose to build a Hadoop cluster yourself, there are still a number of installation options:

Apache tarballs

The Apache Hadoop project and related projects provide binary (and source) tarballs for

each release. Installation from binary tarballs gives you the most flexibility but entails

the most amount of work, since you need to decide on where the installation files, con-

figuration files, and logfiles are located on the filesystem, set their file permissions cor-

rectly, and so on.

Packages

RPM and Debian packages are available from the Apache Bigtop project , as well as

from all the Hadoop vendors. Packages bring a number of advantages over tarballs: they

provide a consistent filesystem layout, they are tested together as a stack (so you know

that the versions of Hadoop and Hive, say, will work together), and they work well with

configuration management tools like Puppet.

Hadoop cluster management tools

Cloudera Manager and Apache Ambari are examples of dedicated tools for installing

and managing a Hadoop cluster over its whole lifecycle. They provide a simple web UI,

and are the recommended way to set up a Hadoop cluster for most users and operators.

These tools encode a lot of operator knowledge about running Hadoop. For example,

they use heuristics based on the hardware profile (among other factors) to choose good

defaults for Hadoop configuration settings. For more complex setups, like HA, or secure

Hadoop, the management tools provide well-tested wizards for getting a working cluster

in a short amount of time. Finally, they add extra features that the other installation op-

tions don't offer, such as unified monitoring and log search, and rolling upgrades (so

you can upgrade the cluster without experiencing downtime).

This chapter and the next give you enough information to set up and operate your own ba-

sic cluster, but even if you are using Hadoop cluster management tools or a service in

Search WWH ::

Custom Search

Home