Managing Hadoop - Hadoop in Action

Databases Reference

In-Depth Information

should also have a backup node set up before you

start it. Otherwise you'll be in trouble if this new NameNode fails too. If you don't have

a machine readily available as a backup, you should at least set up an NFS-mounted

directory. This way the filesystem's state information is in more than one location.

As HDFS writes its metadata to all directories listed in dfs.name.dir , if your

NameNode

To be safer, this new NameNode

has multiple hard drives, you can specify directories from different drives

to hold replicas of the metadata. This way if one drive fails, it's easier to restart the

NameNode without the bad drive than to switch over to the backup node, which

involves moving the IP address, setting up a new backup node, and so on.

Recall that the SNN creates a snapshot of the filesystem's metadata in the

fs.checkpoint.dir directory. As it checkpoints only periodically (once an hour

under the default setup), the metadata is too stale to rely on for failover. But it's still a

good idea to archive this directory periodically over to remote storage. In catastrophic

situations, recovering from stale data is better than no data at all. This can be true if

both the NameNode and the backup fail simultaneously (say, a power surge affecting

both machines). Another unfortunate scenario is if the filesystem's metadata has been

corrupted (say, by human error or a software bug) and has poisoned all the replicas.

Recovering from a checkpoint image is explained in http://issues.apache.org/jira/

browse/HADOOP-2585 .

HDFS's backup and recovery mechanism is undergoing active improvements as

of this writing. You should check with HDFS's online documentation for the latest

news. There have also been applications of specialized Linux software such as DRBD 5

to Hadoop clusters for high availability. You can find one example in http://www.

cloudera.com/blog/2009/07/22/hadoop-ha-configuration/.

8.10

Designing network

layout and rack awareness

When your Hadoop cluster gets big, the nodes will be spread out in more than one

rack and the cluster's network topology starts to affect reliability and performance.

You may want the cluster to survive the failure of an entire rack. You should place your

backup server for NameNode, as described in the previous section, in a separate rack

from the NameNode itself. This way the failure of any one rack will not destroy all cop-

ies of the filesystem's metadata.

With more than one rack, the placement of both block replicas and tasks becomes

more complex. Replicas of a block should be placed in separate racks to reduce the

potential of data loss. For the standard replication value of 3, the default placement

policy for writing a block is this: If the client performing the write operation is part of

the Hadoop cluster, place the first replica

on the DataNode where the client resides.

Otherwise randomly place the replica in the cluster. Place the second replica on a

random rack different from the rack where the first replica resides. Write the third

replica to a different node on the same rack as the second replica. For replication

values higher than 3, place the subsequent replicas on random nodes. As of this

5 http://www.drbd.org.

Hadoop in Action

Search WWH ::

Custom Search

Home