Exploring HDFS Federation and Its High Availability - Cloudera Administration

Database Reference

In-Depth Information

Implementing HDFS Federation

HDFS Federation is a technique of splitting up the filesystem namespace into multiple

parts. Each part will be managed by an individual namenode, resulting in multiple namen-

odes.

In the following diagram, you will see two namenodes, Namenode-1 ( NN1 ) and Namen-

ode-2 ( NN2 ).

Each namenode manages a namespace volume that consists of the namespace metadata and

block pool information. The namespace metadata contains the location information of the

files present in HDFS. A block pool is a collection of data blocks that belong to a single

namespace in a Hadoop cluster.

Both these namenodes have the same set of datanodes in the cluster. The datanodes store

blocks for each of the namenodes. However, the two namenodes do not communicate with

each other. In the preceding diagram, you see only two namenodes; however, in production

environments, you may have more than two namenodes.

With such architecture in place, it is possible to scale the cluster to a large number of

nodes, as the memory is not a limiting factor any more. As a result of this architecture, the

read/write operations throughput will significantly improve as the load is not on a single

Search WWH ::

Custom Search

Home