Database Reference
In-Depth Information
The datanode daemon acts as a slave node and is responsible for storing the actual files in
HDFS. The files are split as data blocks across the cluster. The blocks are typically 64 MB
to 128 MB size blocks. The block size is a configurable parameter. The file blocks in a
Hadoop cluster also replicate themselves to other datanodes for redundancy so that no
data is lost in case a datanode daemon fails. The datanode daemon sends information to
the namenode daemon about the files and blocks stored in that node and responds to the
namenode daemon for all filesystem operations. The following diagram shows how files
are stored in the cluster:
File blocks of files A, B, and C are replicated across multiple nodes of the cluster for re-
dundancy. This ensures availability of data even if one of the nodes fail. You can also see
that blocks of file A are present on nodes 2, 4, and 6; blocks of file B are present on nodes
3, 5, and 7; and blocks of file C are present on 4, 6, and 7. The replication factor con-
figured for this cluster is 3, which signifies that each file block is replicated three times
across the cluster. It is the responsibility of the namenode daemon to maintain a list of the
files and their corresponding locations on the cluster. Whenever a client needs to access a
Search WWH ::




Custom Search