Database Reference
In-Depth Information
Second, making the unit of abstraction a block rather than a file simplifies the storage sub-
system. Simplicity is something to strive for in all systems, but it is especially important
for a distributed system in which the failure modes are so varied. The storage subsystem
deals with blocks, simplifying storage management (because blocks are a fixed size, it is
easy to calculate how many can be stored on a given disk) and eliminating metadata con-
cerns (because blocks are just chunks of data to be stored, file metadata such as permis-
sions information does not need to be stored with the blocks, so another system can
handle metadata separately).
Furthermore, blocks fit well with replication for providing fault tolerance and availability.
To insure against corrupted blocks and disk and machine failure, each block is replicated
to a small number of physically separate machines (typically three). If a block becomes
unavailable, a copy can be read from another location in a way that is transparent to the
client. A block that is no longer available due to corruption or machine failure can be rep-
licated from its alternative locations to other live machines to bring the replication factor
back to the normal level. (See Data Integrity for more on guarding against corrupt data.)
Similarly, some applications may choose to set a high replication factor for the blocks in a
popular file to spread the read load on the cluster.
Like its disk filesystem cousin, HDFS's fsck command understands blocks. For ex-
ample, running:
% hdfs fsck / -files -blocks
will list the blocks that make up each file in the filesystem. (See also Filesystem check
(fsck) . )
Namenodes and Datanodes
An HDFS cluster has two types of nodes operating in a master−worker pattern: a namen-
ode (the master) and a number of datanodes (workers). The namenode manages the
filesystem namespace. It maintains the filesystem tree and the metadata for all the files
and directories in the tree. This information is stored persistently on the local disk in the
form of two files: the namespace image and the edit log. The namenode also knows the
datanodes on which all the blocks for a given file are located; however, it does not store
block locations persistently, because this information is reconstructed from datanodes
when the system starts.
A client accesses the filesystem on behalf of the user by communicating with the namen-
ode and datanodes. The client presents a filesystem interface similar to a Portable Operat-
ing System Interface (POSIX), so the user code does not need to know about the namen-
ode and datanodes to function.
Search WWH ::




Custom Search