Database Reference
In-Depth Information
Chapter 8
Cluster Management
From its inception, Hadoop has been progressing and evolving to help you more easily manage your big data needs.
Compared to version 1, installation of Hadoop V2 via Cloudera's version 4.x stack was an advance; Hadoop tool
binaries were configured as Linux services and Hadoop's tool-related logging and functionality were moved to logical
places within the Linux file system. The progression continues with the move to cluster managers, which consolidate
all of the tools examined thus far in this topic into a single management user interface. Cluster managers automate
much of the difficult task of Hadoop component installation—and their configuration, as well.
This chapter examines Apache Ambari and the Cloudera Cluster Manager, two of several Hadoop cluster
managers that enable you to install the whole Hadoop stack in one go. Management systems like Ambari also use
cluster monitoring tools like Ganglia and Nagios to provide a user interface for management and monitoring within
a single system. In addition, in this chapter you'll learn about the Apache Bigtop tool, with which you can install the
whole stack, as well as run smoke tests during the installation to test the stack operation.
Although the installation of these components will include the whole Hadoop stack, this chapter primarily
demonstrates the ease of use and overall functionality of the installation systems themselves. (I would need an entire
book to cover each piece of subfunctionality within the Hadoop server stack.) Consider this chapter a snapshot of the
current systems and their functionality. Which system is best for your purposes is a question you can answer only after
matching your needs to their capabilities.
Because systems like Ambari install the whole Hadoop cluster, they are not compatible with pre-existing Hadoop
installs, and therefore they cannot use the same set of servers, as were discussed in earlier chapters (hc1nn for the
name node and hc1r1m1 to hc1r1m3 for the data nodes). For this chapter's example, I install the cluster on a new set
of 64-bit machines but I preserve the work to date on the old set of machines whose Name Node server was called
hc1nn. The new Name Node server is called hc2nn, and the four data nodes are called hc2r1m1, hc2r1m2, hc2r1m3,
and hc2r1m4. As for the original servers, the “h” in these server names stands for Hadoop, the “c” indicates the cluster
number, the “r” represents the rack number in the cluster, and the “m” represents the machine number within the
Hadoop cluster rack. So, hc2nn is the Name Node server for Hadoop cluster 2. The server hc2r1m4 is the number 4
machine in rack 1 for Hadoop cluster 2. Also, because the systems examined in this chapter are intended for fresh
servers, I reinstall Centos 6 on each machine prior to sourcing each system.
Initially, I install the Ambari Hadoop cluster manager. You will note that I am sourcing Ambari from the
Hortonworks site, so I use it to install the latest Hortonworks Hadoop stack. If you have attempted all of the Hadoop
tool installations up to this point in the topic, you will have discovered that Hadoop installations and the necessary
configuration are time-consuming and can be difficult. You may encounter errors that take a lot of time to solve. You
might also find that versions of the components will not work with one another. But at this point, all you really want to
do is use the software.
This is where cluster managers become useful: they automate the installation and configure the Hadoop
cluster and the Hadoop tool set. They provide wizards that advise you when you need to make changes. They have
monitoring tools to automatically check the health of your Hadoop cluster. They also offer a means to continuously
upgrade the Hadoop stack.
 
Search WWH ::




Custom Search