Database Reference
In-Depth Information
The long recovery time is a problem for routine maintenance, too. In fact, because unex-
pected failure of the namenode is so rare, the case for planned downtime is actually more
important in practice.
Hadoop 2 remedied this situation by adding support for HDFS high availability (HA). In
this implementation, there are a pair of namenodes in an active-standby configuration. In
the event of the failure of the active namenode, the standby takes over its duties to contin-
ue servicing client requests without a significant interruption. A few architectural changes
are needed to allow this to happen:
▪ The namenodes must use highly available shared storage to share the edit log.
When a standby namenode comes up, it reads up to the end of the shared edit log
to synchronize its state with the active namenode, and then continues to read new
entries as they are written by the active namenode.
▪ Datanodes must send block reports to both namenodes because the block map-
pings are stored in a namenode's memory, and not on disk.
▪ Clients must be configured to handle namenode failover, using a mechanism that
is transparent to users.
▪ The secondary namenode's role is subsumed by the standby, which takes periodic
checkpoints of the active namenode's namespace.
There are two choices for the highly available shared storage: an NFS filer, or a quorum
journal manager (QJM). The QJM is a dedicated HDFS implementation, designed for the
sole purpose of providing a highly available edit log, and is the recommended choice for
most HDFS installations. The QJM runs as a group of journal nodes , and each edit must
be written to a majority of the journal nodes. Typically, there are three journal nodes, so
the system can tolerate the loss of one of them. This arrangement is similar to the way
ZooKeeper works, although it is important to realize that the QJM implementation does
not use ZooKeeper. (Note, however, that HDFS HA does use ZooKeeper for electing the
active namenode, as explained in the next section.)
If the active namenode fails, the standby can take over very quickly (in a few tens of
seconds) because it has the latest state available in memory: both the latest edit log entries
and an up-to-date block mapping. The actual observed failover time will be longer in
practice (around a minute or so), because the system needs to be conservative in deciding
that the active namenode has failed.
In the unlikely event of the standby being down when the active fails, the administrator
can still start the standby from cold. This is no worse than the non-HA case, and from an
operational point of view it's an improvement, because the process is a standard opera-
tional procedure built into Hadoop.
Search WWH ::




Custom Search