MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

HDFS Federation

The namenode keeps a reference to every file and block in the filesystem in memory,

which means that on very large clusters with many files, memory becomes the limiting

factor for scaling (see How Much Memory Does a Namenode Need? ). HDFS federation,

introduced in the 2.x release series, allows a cluster to scale by adding namenodes, each of

which manages a portion of the filesystem namespace. For example, one namenode might

manage all the files rooted under /user , say, and a second namenode might handle files un-

der /share .

Under federation, each namenode manages a namespace volume , which is made up of the

metadata for the namespace, and a block pool containing all the blocks for the files in the

namespace. Namespace volumes are independent of each other, which means namenodes

do not communicate with one another, and furthermore the failure of one namenode does

not affect the availability of the namespaces managed by other namenodes. Block pool

storage is not partitioned, however, so datanodes register with each namenode in the

cluster and store blocks from multiple block pools.

To access a federated HDFS cluster, clients use client-side mount tables to map file paths

to namenodes. This is managed in configuration using ViewFileSystem and the

viewfs:// URIs.

HDFS High Availability

The combination of replicating namenode metadata on multiple filesystems and using the

secondary namenode to create checkpoints protects against data loss, but it does not

provide high availability of the filesystem. The namenode is still a single point of failure

(SPOF). If it did fail, all clients — including MapReduce jobs — would be unable to read,

write, or list files, because the namenode is the sole repository of the metadata and the

file-to-block mapping. In such an event, the whole Hadoop system would effectively be

out of service until a new namenode could be brought online.

To recover from a failed namenode in this situation, an administrator starts a new primary

namenode with one of the filesystem metadata replicas and configures datanodes and cli-

ents to use this new namenode. The new namenode is not able to serve requests until it

has (i) loaded its namespace image into memory, (ii) replayed its edit log, and (iii) re-

ceived enough block reports from the datanodes to leave safe mode. On large clusters with

many files and blocks, the time it takes for a namenode to start from cold can be 30

minutes or more.

Search WWH ::

Custom Search

Home