Information Technology Reference
In-Depth Information
a Blockreport to the NameNode. This Blockreport is formed by scanning through its local
file system and generating a list of all HDFS data blocks that correspond to local files. The
NameNode, already in Safemode, checks the data blocks, exits Safemode, makes a list
of data blocks that exhibit a lower replication degree, and stabilizes the file system through
data block replication if need be.
Failures The NameNode exclusively relies on heartbeat messages from DataNodes to
maintain a health list of nodes. A network partition can cause some of the DataNodes to
lose connectivity with the NameNode. Absence of heartbeats to the NameNode suggests
faulty or dead nodes. In such a case, the NameNode stops forwarding any new I/O to
presumably dead nodes.
The replication degree of data blocks can drop below the specified value because of reasons
such as dead/unavailable DataNodes, corrupt replicas, a disk drive or DataNode failure,
and a change of the replication factor of a file. In such cases, the NameNode initiates neces-
sary replication of the affected data blocks.
Cluster Rebalancing The HDFS maintains balance between DataNodes. If a DataNode
is found to fall below a certain threshold of free space, data blocks are migrated from that
DataNode to another.
Similarly, in the event of spikes of high demand for a particular file, a scheme might
dynamically create additional replicas and rebalance other data blocks in the cluster.
Data Integrity Data corruption is not new and can be caused by disk drive faults (hard
failure), network faults, and buggy application software. It is possible that a data block
fetched from a DataNode arrives in a corrupted state. The HDFS client software imple-
ments checksum procedures on HDFS files. A checksum of each block of a file is created
and stored in the same HDFS namespace. When the client retrieves a file, it verifies that the
file checksum matches the checksum stored in the checksum file on the HDFS namespace.
If the checksums do not match, the client can opt to retrieve a replica of the block from
another DataNode.
Metadata Disk Failure The FsImage and EditLog are central data structures of HDFS
and critical to its correct functional state. The NameNode can be configured to maintain
multiple copies of these files (provisioning of a secondary NameNode). Updates to either
of these files cause synchronous updates to both. Updating multiple copies may cause
degradation of NameNode performance. However, since HDFS applications are typically
data intensive and not metadata intensive, the chances of degradation preventing correct
behavior is limited. During startup, the NameNode selects the latest consistent copy of
FsImage and EditLog.
One of the drawbacks of such a design is that a NameNode can form a single point of failure
for an HDFS cluster. However, election algorithms can be implemented that opt to select a
secondary NameNode (if provisioned) as the new in-charge NameNode.
Search WWH ::




Custom Search