Database Reference
In-Depth Information
Chapter 4. Exploring HDFS Federation
and Its High Availability
You are now ready to set up a Hadoop cluster using CDH5. Once you have a cluster up and
running, you are now responsible for managing it and making sure the cluster is available
all the time. In this chapter, we will cover some techniques to manage HDFS efficiently
and also handle the single point of failure in a Hadoop cluster. In this chapter, we will cover
the following topics:
• Configuring HDFS Federation
• HDFS high availability using Quorum-based storage and storage using Network
File System ( NFS )
• Jobtracker high availability
The heart of HDFS is the namenode. The namenode manages the locations of all data
blocks in the cluster. To serve requests faster, the namenode manages all its information in
memory. For small clusters, the information stored is lightweight and in most cases, a de-
cent amount of RAM is enough to handle all the information required to maintain a cluster.
However, when the number of datanodes increases, hosting a large number of files and
blocks, the RAM may fall short and would limit the scalability of the cluster. To address
this problem, HDFS Federation was built.
Search WWH ::




Custom Search