Starting Hadoop - Hadoop in Action - page 25

Databases Reference

In-Depth Information

Secondary NameNode

NameNode

JobTracker

DataNode

DataNode

DataNode

DataNode

TaskTracker

TaskTracker

TaskTracker

TaskTracker

Figure 2.3 Topology of a typical Hadoop cluster. It's a master/slave architecture

in which the NameNode and JobTracker are masters and the DataNodes and

TaskTrackers are slaves.

Having covered each of the Hadoop daemons, we depict the topology of one typical

Hadoop cluster in figure 2.3.

This topology features a master node running the NameNode and JobTracker

daemons and a standalone node

with the SNN in case the master node fails. For small

clusters, the SNN can reside on one of the slave nodes. On the other hand, for large

clusters, separate the NameNode and JobTracker on two machines. The slave machines

each host a DataNode and TaskTracker, for running tasks on the same node where

their data is stored.

We'll work toward setting up a complete Hadoop cluster of this form by first

establishing the master node and the control channels between nodes. If a Hadoop

cluster is already available to you, you can skip the next section on how to set up Secure

Shell (SSH) channels between nodes. You also have a couple of options to run Hadoop

using only a single machine, in what are known as standalone and pseudo-distributed

modes. They're useful for development. Configuring Hadoop to run in these two modes

or the standard cluster setup (fully distributed mode) is covered in section 2.3.

2.2

Setting up SSH for a Hadoop cluster

When setting up a Hadoop cluster, you'll need to designate one specific node as the

master node. As shown in figure 2.3, this server will typically host the NameNode and

Next Page

Hadoop in Action

Search WWH ::

Custom Search

Home