Databases Reference
In-Depth Information
Secondary NameNode into its own machine, the spec of which should be comparable
to the NameNode. But, before going into how to set up a separate server as a
Secondary NameNode, I should explain what the Secondary NameNode does and
doesn't do, and in turn some of NameNode's underlying mechanics.
Due to its unfortunate naming, the Secondary NameNode (SNN) is sometimes
confused with a failover backup for NameNode. It most certainly is not. The SNN
only serves to periodically clean up and tighten the filesystem's state information in
NameNode, helping NameNode become more efficient. NameNode manages the
filesystem's state information using two files, FsImage and EditLog . The file FsImage is
a snapshot of the filesystem at some checkpoint, and EditLog records each incremental
change ( delta ) to the filesystem after that checkpoint. These two files can completely
determine the current state of the filesystem. When you initialize NameNode, it merges
these two files to create a new snapshot. At the end of NameNode's initialization,
FsImage will contain the new snapshot and EditLog will be empty. Afterward any
operation that changes the state of HDFS is appended to EditLog , whereas FsImage will
remain unchanged. When you shut down NameNode and restart it, the consolidation
will take place again and make a new snapshot. Note that the two files are only for
retaining the filesystem's state information while NameNode is not running (either
intentionally shut down or due to system malfunction). NameNode keeps in memory
a constantly maintained copy of the filesystem's state information to quickly answer
queries about the filesystem.
For a busy cluster, the EditLog file will grow quite large, and the next restart of
NameNode will take a long time to merge EditLog into FsImage . For busy clusters,
it can also be a long time in between NameNode restarts, and you may want more
frequent snapshots for archival purposes. This is where SNN comes in. It consolidates
FsImage and EditLog into a new snapshot and leaves the NameNode alone to serve
live traffic. Therefore, it's more appropriate to think of the SNN as a checkpointing
server. Merging FsImage and EditLog is memory intensive, requiring an amount of
memory on the same order as normal NameNode operation. It's best for the SNN to
be on a separate server that is as powerful as the primary NameNode.
To configure HDFS to use a separate server as the SNN, first list that server's host
name or IP address in the conf/masters file. Unfortunately, this file name is also
confusing. The masters in Hadoop (NameNode and JobTracker) are whichever
machine you run bin/start-dfs.sh and bin/start-mapred.sh on. What's listed in
conf/masters is the SNN, not any of the masters.
You should also modify the conf/hdfs-site.xml file on the SNN such that the dfs.
http.address property points to port 50070 of the NameNode's host address, like
<property>
<name>dfs.http.address</name>
<value> namenode.hadoop-host.com :50070</value>
</property>
You should set this property because the SNN retrieves FsImage and EditLog from the
NameNode by sending HTTP Get requests to the URLs:
 
Search WWH ::




Custom Search