Databases Reference
In-Depth Information
HDFS is a file system and, like any other file system architecture, it needs to manage consistency,
recoverability, and concurrency for reliable operations. These requirements have been addressed in
the architecture by creating image, journal, and checkpoint files.
Image An image represents the metadata of the namespace (inodes and lists of blocks). On startup,
the NameNode pins the entire namespace image in memory. The in-memory persistence enables the
NameNode to service multiple client requests concurrently.
Journal The journal represents the modification log of the image in the local host's native file system.
During normal operations, each client transaction is recorded in the journal, and the journal file is
flushed and synced before the acknowledgment is sent to the client. The NameNode upon startup or
from a recovery can replay this journal.
Checkpoint To enable recovery, the persistent record of the image is also stored in the local host's
native files system and is called a checkpoint . Once the system starts up, the NameNode never modi-
fies or updates the checkpoint file. A new checkpoint file can be created during the next startup, on a
restart, or on demand when requested by the administrator or by the CheckpointNode (described later
in this chapter).
HDFS startup
HDFS manages the startup sequence based on the image file, which is an in-memory persistence.
During initial startup, every time the NameNode initializes a namespace image from the checkpoint
file and replays all the changes from a journal file. Once the startup sequence completes the process,
a new checkpoint and an empty journal are written back to the storage directories and the NameNode
starts serving client requests. For improved redundancy and reliability, copies of the checkpoint and
journal should be made at other servers.
Block allocation and storage in HDFS
Data organization in the HDFS is managed similar to GFS. The namespace is represented by inodes,
which represent files, directories, and records attributes like permissions, modification and access
times, and namespace and disk space quotas. The files are split into user-defined block sizes (default
is 128MB) and stored into a DataNode and two replicas at a minimum to ensure availability and
redundancy, though the user can configure more replicas. Typically, the storage location of block rep-
licas may change over time and therefore are not part of the persistent checkpoint.
HDFS client
A thin layer of interface that is used by programs to access data stored within HDFS is called the cli-
ent. The client first contacts the NameNode to get the locations of data blocks that comprise the file.
Once the block data is returned to the client, subsequently the client reads block contents from the
DataNode closest to it.
When writing data, the client first requests the NameNode to provide to the DataNodes where the
data can be written. The NameNode returns the block to write the data. When the first block is filled,
additional blocks are provided by the NameNode in a pipeline. A block for each request might not be
on the same DataNode.
One of the biggest design differentiators of HDFS is the API that exposes the locations of file
blocks. This allows applications like MapReduce to schedule a task to where the data is located, thus
improving the I/O performance. The API also includes functionality to set the replication factor for
Search WWH ::




Custom Search