Databases Reference
In-Depth Information
should also have a backup node set up before you
start it. Otherwise you'll be in trouble if this new NameNode fails too. If you don't have
a machine readily available as a backup, you should at least set up an NFS-mounted
directory. This way the filesystem's state information is in more than one location.
As HDFS writes its metadata to all directories listed in dfs.name.dir , if your
NameNode
To be safer, this new NameNode
has multiple hard drives, you can specify directories from different drives
to hold replicas of the metadata. This way if one drive fails, it's easier to restart the
NameNode without the bad drive than to switch over to the backup node, which
involves moving the IP address, setting up a new backup node, and so on.
Recall that the SNN creates a snapshot of the filesystem's metadata in the
fs.checkpoint.dir directory. As it checkpoints only periodically (once an hour
under the default setup), the metadata is too stale to rely on for failover. But it's still a
good idea to archive this directory periodically over to remote storage. In catastrophic
situations, recovering from stale data is better than no data at all. This can be true if
both the NameNode and the backup fail simultaneously (say, a power surge affecting
both machines). Another unfortunate scenario is if the filesystem's metadata has been
corrupted (say, by human error or a software bug) and has poisoned all the replicas.
Recovering from a checkpoint image is explained in http://issues.apache.org/jira/
browse/HADOOP-2585 .
HDFS's backup and recovery mechanism is undergoing active improvements as
of this writing. You should check with HDFS's online documentation for the latest
news. There have also been applications of specialized Linux software such as DRBD 5
to Hadoop clusters for high availability. You can find one example in http://www.
cloudera.com/blog/2009/07/22/hadoop-ha-configuration/.
8.10
Designing network
layout and rack awareness
When your Hadoop cluster gets big, the nodes will be spread out in more than one
rack and the cluster's network topology starts to affect reliability and performance.
You may want the cluster to survive the failure of an entire rack. You should place your
backup server for NameNode, as described in the previous section, in a separate rack
from the NameNode itself. This way the failure of any one rack will not destroy all cop-
ies of the filesystem's metadata.
With more than one rack, the placement of both block replicas and tasks becomes
more complex. Replicas of a block should be placed in separate racks to reduce the
potential of data loss. For the standard replication value of 3, the default placement
policy for writing a block is this: If the client performing the write operation is part of
the Hadoop cluster, place the first replica
on the DataNode where the client resides.
Otherwise randomly place the replica in the cluster. Place the second replica on a
random rack different from the rack where the first replica resides. Write the third
replica to a different node on the same rack as the second replica. For replication
values higher than 3, place the subsequent replicas on random nodes. As of this
5 http://www.drbd.org.
 
Search WWH ::




Custom Search