Databases Reference
In-Depth Information
Metadata
Blocks Management
HDFS
Client
Name
Node
Data
Node
Data
Node
Data
Node
Data Read & Write
Replicas
FIGURE 4.6
HDFS architecture.
Cross-platform compatibility —the ability to integrate across multiple architecture platforms.
Compute and storage in one environment —data and computation colocated in the same
architecture removing redundant I/O and excessive disk access.
The three principle goals of HDFS architecture are:
1. Process extremely large files ranging from multiple gigabytes to petabytes.
2. Streaming data processing to read data at high-throughput rates and process data on read.
3. Capability to execute on commodity hardware with no special hardware requirements.
HDFS architecture evolved from the NDFS architecture, which is based on the GFS architecture.
The next section discusses the HDFS architecture.
HDFS architecture
Figure 4.6 shows the overall conceptual architecture of HDFS. The main building blocks of HDFS are:
NameNode (master node)
DataNodes (slave nodes)
Image
Journal
Checkpoint
NameNode The NameNode is a single master server that manages the file system namespace and
regulates access to files by clients. Additionally, the NameNode manages all the operations like open-
ing, closing, moving, naming, and renaming of files and directories. It also manages the mapping of
blocks to DataNodes.
DataNodes DataNodes represent the slaves in the architecture that manage data and the storage
attached to the data. A typical HDFS cluster can have thousands of DataNodes and tens of thousands
of HDFS clients per cluster, since each DataNode may execute multiple application tasks simultane-
ously. The DataNodes are responsible for managing read and write requests from the file system's
clients, and block maintenance and perform replication as directed by the NameNode. The block
management in HDFS is different from a normal file system. The size of the data file equals the
actual length of the block. This means if a block is half full it needs only half of the space of the
full block on the local drive, thereby optimizing storage space for compactness, and there is no extra
space consumed on the block unlike a regular file system.
 
Search WWH ::




Custom Search