Storing and Managing Data in HDFS - Microsoft Big Data Solutions

Database Reference

In-Depth Information

advisable. In addition, setting up a Backup node may help you recover more

quickly in the event of a NameNode failure. The Backup node maintains

its own copy of the FsImage and EditLog. It receives all the file system

transactions from the NameNode and uses that to keep its copy of the

FsImage up to date. If the NameNode fails catastrophically, you can use

the Backup node's copy of the FsImage to start up a new NameNode more

quickly.

NOTE

Despite their name, Backup nodes aren't a direct backup to a

NameNode. Rather, they manage the checkpointing process and retain

a backup copy of the FsImage and EditLog. A NameNode cannot fail

over to a Backup node automatically.

NOTE

Hadoop 2.0 includes several improvements for improving the

availability of NameNodes, with support for Active and Standby

NameNodes. These new options will make it much easier to have a

highly available HDFS cluster.

Data Replication

One of the critical features of HDFS is its support for data replication. This

is critical for creating redundancy in the data, which allows HDFS to be

resilient to the failure of one or more nodes. Without this capability, HDFS

would not be reliable to run on commodity hardware, and as a result, would

require significantly more investment in highly available servers.

Data replication also enables better performance for large data sets. By

spreading copies of the data across multiple nodes, the data can be read in

parallel. This enables faster access and processing of large files.

Search WWH ::

Custom Search

Home