MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Second, making the unit of abstraction a block rather than a file simplifies the storage sub-

system. Simplicity is something to strive for in all systems, but it is especially important

for a distributed system in which the failure modes are so varied. The storage subsystem

deals with blocks, simplifying storage management (because blocks are a fixed size, it is

easy to calculate how many can be stored on a given disk) and eliminating metadata con-

cerns (because blocks are just chunks of data to be stored, file metadata such as permis-

sions information does not need to be stored with the blocks, so another system can

handle metadata separately).

Furthermore, blocks fit well with replication for providing fault tolerance and availability.

To insure against corrupted blocks and disk and machine failure, each block is replicated

to a small number of physically separate machines (typically three). If a block becomes

unavailable, a copy can be read from another location in a way that is transparent to the

client. A block that is no longer available due to corruption or machine failure can be rep-

licated from its alternative locations to other live machines to bring the replication factor

back to the normal level. (See Data Integrity for more on guarding against corrupt data.)

Similarly, some applications may choose to set a high replication factor for the blocks in a

popular file to spread the read load on the cluster.

Like its disk filesystem cousin, HDFS's fsck command understands blocks. For ex-

ample, running:

% hdfs fsck / -files -blocks

will list the blocks that make up each file in the filesystem. (See also Filesystem check

(fsck) . )

Namenodes and Datanodes

An HDFS cluster has two types of nodes operating in a master−worker pattern: a namen-

ode (the master) and a number of datanodes (workers). The namenode manages the

filesystem namespace. It maintains the filesystem tree and the metadata for all the files

and directories in the tree. This information is stored persistently on the local disk in the

form of two files: the namespace image and the edit log. The namenode also knows the

datanodes on which all the blocks for a given file are located; however, it does not store

block locations persistently, because this information is reconstructed from datanodes

when the system starts.

A client accesses the filesystem on behalf of the user by communicating with the namen-

ode and datanodes. The client presents a filesystem interface similar to a Portable Operat-

ing System Interface (POSIX), so the user code does not need to know about the namen-

ode and datanodes to function.

Search WWH ::

Custom Search

Home