MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

The long recovery time is a problem for routine maintenance, too. In fact, because unex-

pected failure of the namenode is so rare, the case for planned downtime is actually more

important in practice.

Hadoop 2 remedied this situation by adding support for HDFS high availability (HA). In

this implementation, there are a pair of namenodes in an active-standby configuration. In

the event of the failure of the active namenode, the standby takes over its duties to contin-

ue servicing client requests without a significant interruption. A few architectural changes

are needed to allow this to happen:

▪ The namenodes must use highly available shared storage to share the edit log.

When a standby namenode comes up, it reads up to the end of the shared edit log

to synchronize its state with the active namenode, and then continues to read new

entries as they are written by the active namenode.

▪ Datanodes must send block reports to both namenodes because the block map-

pings are stored in a namenode's memory, and not on disk.

▪ Clients must be configured to handle namenode failover, using a mechanism that

is transparent to users.

▪ The secondary namenode's role is subsumed by the standby, which takes periodic

checkpoints of the active namenode's namespace.

There are two choices for the highly available shared storage: an NFS filer, or a quorum

journal manager (QJM). The QJM is a dedicated HDFS implementation, designed for the

sole purpose of providing a highly available edit log, and is the recommended choice for

most HDFS installations. The QJM runs as a group of journal nodes , and each edit must

be written to a majority of the journal nodes. Typically, there are three journal nodes, so

the system can tolerate the loss of one of them. This arrangement is similar to the way

ZooKeeper works, although it is important to realize that the QJM implementation does

not use ZooKeeper. (Note, however, that HDFS HA does use ZooKeeper for electing the

active namenode, as explained in the next section.)

If the active namenode fails, the standby can take over very quickly (in a few tens of

seconds) because it has the latest state available in memory: both the latest edit log entries

and an up-to-date block mapping. The actual observed failover time will be longer in

practice (around a minute or so), because the system needs to be conservative in deciding

that the active namenode has failed.

In the unlikely event of the standby being down when the active fails, the administrator

can still start the standby from cold. This is no worse than the non-HA case, and from an

operational point of view it's an improvement, because the process is a standard opera-

tional procedure built into Hadoop.

Search WWH ::

Custom Search

Home