Databases Reference
In-Depth Information
Secondary NameNode
NameNode
JobTracker
DataNode
DataNode
DataNode
DataNode
TaskTracker
TaskTracker
TaskTracker
TaskTracker
Figure 2.3 Topology of a typical Hadoop cluster. It's a master/slave architecture
in which the NameNode and JobTracker are masters and the DataNodes and
TaskTrackers are slaves.
Having covered each of the Hadoop daemons, we depict the topology of one typical
Hadoop cluster in figure 2.3.
This topology features a master node running the NameNode and JobTracker
daemons and a standalone node
with the SNN in case the master node fails. For small
clusters, the SNN can reside on one of the slave nodes. On the other hand, for large
clusters, separate the NameNode and JobTracker on two machines. The slave machines
each host a DataNode and TaskTracker, for running tasks on the same node where
their data is stored.
We'll work toward setting up a complete Hadoop cluster of this form by first
establishing the master node and the control channels between nodes. If a Hadoop
cluster is already available to you, you can skip the next section on how to set up Secure
Shell (SSH) channels between nodes. You also have a couple of options to run Hadoop
using only a single machine, in what are known as standalone and pseudo-distributed
modes. They're useful for development. Configuring Hadoop to run in these two modes
or the standard cluster setup (fully distributed mode) is covered in section 2.3.
2.2
Setting up SSH for a Hadoop cluster
When setting up a Hadoop cluster, you'll need to designate one specific node as the
master node. As shown in figure 2.3, this server will typically host the NameNode and
 
Search WWH ::




Custom Search