Cluster Management - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Chapter 8

Cluster Management

From its inception, Hadoop has been progressing and evolving to help you more easily manage your big data needs.

Compared to version 1, installation of Hadoop V2 via Cloudera's version 4.x stack was an advance; Hadoop tool

binaries were configured as Linux services and Hadoop's tool-related logging and functionality were moved to logical

places within the Linux file system. The progression continues with the move to cluster managers, which consolidate

all of the tools examined thus far in this topic into a single management user interface. Cluster managers automate

much of the difficult task of Hadoop component installation—and their configuration, as well.

This chapter examines Apache Ambari and the Cloudera Cluster Manager, two of several Hadoop cluster

managers that enable you to install the whole Hadoop stack in one go. Management systems like Ambari also use

cluster monitoring tools like Ganglia and Nagios to provide a user interface for management and monitoring within

a single system. In addition, in this chapter you'll learn about the Apache Bigtop tool, with which you can install the

whole stack, as well as run smoke tests during the installation to test the stack operation.

Although the installation of these components will include the whole Hadoop stack, this chapter primarily

demonstrates the ease of use and overall functionality of the installation systems themselves. (I would need an entire

book to cover each piece of subfunctionality within the Hadoop server stack.) Consider this chapter a snapshot of the

current systems and their functionality. Which system is best for your purposes is a question you can answer only after

matching your needs to their capabilities.

Because systems like Ambari install the whole Hadoop cluster, they are not compatible with pre-existing Hadoop

installs, and therefore they cannot use the same set of servers, as were discussed in earlier chapters (hc1nn for the

name node and hc1r1m1 to hc1r1m3 for the data nodes). For this chapter's example, I install the cluster on a new set

of 64-bit machines but I preserve the work to date on the old set of machines whose Name Node server was called

hc1nn. The new Name Node server is called hc2nn, and the four data nodes are called hc2r1m1, hc2r1m2, hc2r1m3,

and hc2r1m4. As for the original servers, the “h” in these server names stands for Hadoop, the “c” indicates the cluster

number, the “r” represents the rack number in the cluster, and the “m” represents the machine number within the

Hadoop cluster rack. So, hc2nn is the Name Node server for Hadoop cluster 2. The server hc2r1m4 is the number 4

machine in rack 1 for Hadoop cluster 2. Also, because the systems examined in this chapter are intended for fresh

servers, I reinstall Centos 6 on each machine prior to sourcing each system.

Initially, I install the Ambari Hadoop cluster manager. You will note that I am sourcing Ambari from the

Hortonworks site, so I use it to install the latest Hortonworks Hadoop stack. If you have attempted all of the Hadoop

tool installations up to this point in the topic, you will have discovered that Hadoop installations and the necessary

configuration are time-consuming and can be difficult. You may encounter errors that take a lot of time to solve. You

might also find that versions of the components will not work with one another. But at this point, all you really want to

do is use the software.

This is where cluster managers become useful: they automate the installation and configure the Hadoop

cluster and the Hadoop tool set. They provide wizards that advise you when you need to make changes. They have

monitoring tools to automatically check the health of your Hadoop cluster. They also offer a means to continuously

upgrade the Hadoop stack.

Search WWH ::

Custom Search

Home