Database Reference
In-Depth Information
to which DataNodes. These blocks of data are usually 64MB, but the setting
is configurable.
The DataNode is responsible for the creation of blocks of data in its physical
storageandforthedeletionofthoseblocks.Itisalsoresponsibleforcreation
of replica blocks from other nodes. The NameNode coordinates this activity,
telling the DataNode what blocks to create, delete, or replicate. DataNodes
communicate with the NameNode by sending a regular “heartbeat”
communication over the network. This heartbeat indicates that the
DataNode is operating correctly. A block report is also delivered with the
heartbeat and provides a list of all the blocks stored on the DataNode.
The NameNode maintains a transaction history of all changes to the file
system, known as the EditLog. It also maintains a file, referred to as the
FsImage, that contains the file system metadata. The FsImage and EditLog
files are read by the NameNode when it starts up, and the EditLog's
transaction history is applied to the FsImage. This brings the FsImage
up-to-date with the latest changes recorded by the NameNode. Once the
FsImage is updated, it is written back to the file system, and the EditLog
is cleared. At this point, the NameNode can begin accepting requests. This
process (shown in Figure 5.1 ) is referred to as checkpointing, and it is run
only on startup. It can have some performance impact if the NameNode has
accumulated a large EditLog.
Figure 5.1 The checkpointing process
The NameNode is a crucial component of any HDFS cluster. Without a
functioning NameNode, the data cannot be accessed. That means that the
NameNode is a single point of failure for the cluster. Because of that, the
NameNode is one place that using a more fault-tolerant hardware setup is
 
 
Search WWH ::




Custom Search