Database Reference
In-Depth Information
Chapter 10. Setting Up a Hadoop Cluster
This chapter explains how to set up Hadoop to run on a cluster of machines. Running
HDFS, MapReduce, and YARN on a single machine is great for learning about these sys-
tems, but to do useful work, they need to run on multiple nodes.
There are a few options when it comes to getting a Hadoop cluster, from building your
own, to running on rented hardware or using an offering that provides Hadoop as a hosted
service in the cloud. The number of hosted options is too large to list here, but even if you
choose to build a Hadoop cluster yourself, there are still a number of installation options:
Apache tarballs
The Apache Hadoop project and related projects provide binary (and source) tarballs for
each release. Installation from binary tarballs gives you the most flexibility but entails
the most amount of work, since you need to decide on where the installation files, con-
figuration files, and logfiles are located on the filesystem, set their file permissions cor-
rectly, and so on.
Packages
RPM and Debian packages are available from the Apache Bigtop project , as well as
from all the Hadoop vendors. Packages bring a number of advantages over tarballs: they
provide a consistent filesystem layout, they are tested together as a stack (so you know
that the versions of Hadoop and Hive, say, will work together), and they work well with
configuration management tools like Puppet.
Hadoop cluster management tools
Cloudera Manager and Apache Ambari are examples of dedicated tools for installing
and managing a Hadoop cluster over its whole lifecycle. They provide a simple web UI,
and are the recommended way to set up a Hadoop cluster for most users and operators.
These tools encode a lot of operator knowledge about running Hadoop. For example,
they use heuristics based on the hardware profile (among other factors) to choose good
defaults for Hadoop configuration settings. For more complex setups, like HA, or secure
Hadoop, the management tools provide well-tested wizards for getting a working cluster
in a short amount of time. Finally, they add extra features that the other installation op-
tions don't offer, such as unified monitoring and log search, and rolling upgrades (so
you can upgrade the cluster without experiencing downtime).
This chapter and the next give you enough information to set up and operate your own ba-
sic cluster, but even if you are using Hadoop cluster management tools or a service in
Search WWH ::




Custom Search