Database Reference
In-Depth Information
HDFS Federation
The namenode keeps a reference to every file and block in the filesystem in memory,
which means that on very large clusters with many files, memory becomes the limiting
factor for scaling (see How Much Memory Does a Namenode Need? ). HDFS federation,
introduced in the 2.x release series, allows a cluster to scale by adding namenodes, each of
which manages a portion of the filesystem namespace. For example, one namenode might
manage all the files rooted under /user , say, and a second namenode might handle files un-
der /share .
Under federation, each namenode manages a namespace volume , which is made up of the
metadata for the namespace, and a block pool containing all the blocks for the files in the
namespace. Namespace volumes are independent of each other, which means namenodes
do not communicate with one another, and furthermore the failure of one namenode does
not affect the availability of the namespaces managed by other namenodes. Block pool
storage is not partitioned, however, so datanodes register with each namenode in the
cluster and store blocks from multiple block pools.
To access a federated HDFS cluster, clients use client-side mount tables to map file paths
to namenodes. This is managed in configuration using ViewFileSystem and the
viewfs:// URIs.
HDFS High Availability
The combination of replicating namenode metadata on multiple filesystems and using the
secondary namenode to create checkpoints protects against data loss, but it does not
provide high availability of the filesystem. The namenode is still a single point of failure
(SPOF). If it did fail, all clients — including MapReduce jobs — would be unable to read,
write, or list files, because the namenode is the sole repository of the metadata and the
file-to-block mapping. In such an event, the whole Hadoop system would effectively be
out of service until a new namenode could be brought online.
To recover from a failed namenode in this situation, an administrator starts a new primary
namenode with one of the filesystem metadata replicas and configures datanodes and cli-
ents to use this new namenode. The new namenode is not able to serve requests until it
has (i) loaded its namespace image into memory, (ii) replayed its edit log, and (iii) re-
ceived enough block reports from the datanodes to leave safe mode. On large clusters with
many files and blocks, the time it takes for a namenode to start from cold can be 30
minutes or more.
Search WWH ::




Custom Search