Database Reference
In-Depth Information
Namenode
The namenode daemon is a master daemon and is responsible for storing all the location
information of the files present in HDFS. The actual data is never stored on a namenode. In
other words, it holds the metadata of the files in HDFS.
The namenode maintains the entire metadata in RAM, which helps clients receive quick re-
sponses to read requests. Therefore, it is important to run namenode from a machine that
has lots of RAM at its disposal. The higher the number of files in HDFS, the higher the
consumption of RAM. The namenode daemon also maintains a persistent checkpoint of the
metadata in a file stored on the disk called the fsimage file.
Whenever a file is placed/deleted/updated in the cluster, an entry of this action is updated in
a file called the edits logfile. After updating the edits log, the metadata present in-
memory is also updated accordingly. It is important to note that the fsimage file is not
updated for every write operation.
In case the namenode daemon is restarted, the following sequence of events occur at na-
menode boot up:
1. Read the fsimage file from the disk and load it into memory (RAM).
2. Read the actions that are present in the edits log and apply each action to the in-
memory representation of the fsimage file.
3. Write the modified in-memory representation to the fsimage file on the disk.
The preceding steps make sure that the in-memory representation is up to date.
The namenode daemon is a single point of failure in Hadoop 1.x, which means that if the
node hosting the namenode daemon fails, the filesystem becomes unusable. To handle this,
the administrator has to configure the namenode to write the fsimage file to the local
disk as well as a remote disk on the network. This backup on the remote disk can be used to
restore the namenode on a freshly installed server. Newer versions of Apache Hadoop (2.x)
now support High Availability ( HA ), which deploys two namenodes in an active/passive
configuration, wherein if the active namenode fails, the control falls onto the passive na-
menode, making it active. This configuration reduces the downtime in case of a namenode
failure.
Search WWH ::




Custom Search