Database Reference
In-Depth Information
block concept as it is carried over from the file system found on your own
computer. Blocks , in this context, are how files are split up so that they can
be written to your hard drive in whatever free space is available.
A lot of functional similarities exist between your file system blocks and
the HDFS block. HDFS blocks split files, some which may be larger than
any single drive, so that they can be distributed throughout the cluster and
subsequently written to each node's disk. HDFS blocks are also much larger
than those in use on your local file system, defaulting to an initial size of
64MB (but often being allocated much larger).
Within an HDFS cluster, two types or roles of machines or servers make up
what is often referred to as the master/slave architecture. The first, called
the NameNode, functions as the master or controller for the entire cluster.
It's responsible for maintaining all the HDFS metadata and drives the entire
file system namespace operation. There can be only one single NameNode
per cluster, and if it is lost or fails, all the data in the HDFS cluster is gone.
The second type of role within an HDFS cluster is the DataNode. And
although there is only one NameNode, there are usually many DataNodes.
These nodes primarily interact with HDFS clients by taking on the
responsibility to read and write or store data or data blocks. This makes
scaling in your cluster easy, as you simply add additional DataNodes to your
cluster to increase capacity. The DataNode is also responsible for replicating
data out when instructed to do so by the NameNode (more on HDFS
replication shortly).
HDFS Read and Write Operations
To get a better understanding of how these parts or pieces fit together,
Figure 4.1 and Figure 4.2 illustrate how a client reads from and writes to an
HDFS cluster.
 
Search WWH ::




Custom Search