Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

HDFS is a file system and, like any other file system architecture, it needs to manage consistency,

recoverability, and concurrency for reliable operations. These requirements have been addressed in

the architecture by creating image, journal, and checkpoint files.

Image An image represents the metadata of the namespace (inodes and lists of blocks). On startup,

the NameNode pins the entire namespace image in memory. The in-memory persistence enables the

NameNode to service multiple client requests concurrently.

Journal The journal represents the modification log of the image in the local host's native file system.

During normal operations, each client transaction is recorded in the journal, and the journal file is

flushed and synced before the acknowledgment is sent to the client. The NameNode upon startup or

from a recovery can replay this journal.

Checkpoint To enable recovery, the persistent record of the image is also stored in the local host's

native files system and is called a checkpoint . Once the system starts up, the NameNode never modi-

fies or updates the checkpoint file. A new checkpoint file can be created during the next startup, on a

restart, or on demand when requested by the administrator or by the CheckpointNode (described later

in this chapter).

HDFS startup

HDFS manages the startup sequence based on the image file, which is an in-memory persistence.

During initial startup, every time the NameNode initializes a namespace image from the checkpoint

file and replays all the changes from a journal file. Once the startup sequence completes the process,

a new checkpoint and an empty journal are written back to the storage directories and the NameNode

starts serving client requests. For improved redundancy and reliability, copies of the checkpoint and

journal should be made at other servers.

Block allocation and storage in HDFS

Data organization in the HDFS is managed similar to GFS. The namespace is represented by inodes,

which represent files, directories, and records attributes like permissions, modification and access

times, and namespace and disk space quotas. The files are split into user-defined block sizes (default

is 128MB) and stored into a DataNode and two replicas at a minimum to ensure availability and

redundancy, though the user can configure more replicas. Typically, the storage location of block rep-

licas may change over time and therefore are not part of the persistent checkpoint.

HDFS client

A thin layer of interface that is used by programs to access data stored within HDFS is called the cli-

ent. The client first contacts the NameNode to get the locations of data blocks that comprise the file.

Once the block data is returned to the client, subsequently the client reads block contents from the

DataNode closest to it.

When writing data, the client first requests the NameNode to provide to the DataNodes where the

data can be written. The NameNode returns the block to write the data. When the first block is filled,

additional blocks are provided by the NameNode in a pipeline. A block for each request might not be

on the same DataNode.

One of the biggest design differentiators of HDFS is the API that exposes the locations of file

blocks. This allows applications like MapReduce to schedule a task to where the data is located, thus

improving the I/O performance. The API also includes functionality to set the replication factor for

Search WWH ::

Custom Search

Home