HDFS, Hive, HBase, and HCatalog - Microsoft Big Data Solutions

Database Reference

In-Depth Information

block concept as it is carried over from the file system found on your own

computer. Blocks , in this context, are how files are split up so that they can

be written to your hard drive in whatever free space is available.

A lot of functional similarities exist between your file system blocks and

the HDFS block. HDFS blocks split files, some which may be larger than

any single drive, so that they can be distributed throughout the cluster and

subsequently written to each node's disk. HDFS blocks are also much larger

than those in use on your local file system, defaulting to an initial size of

64MB (but often being allocated much larger).

Within an HDFS cluster, two types or roles of machines or servers make up

what is often referred to as the master/slave architecture. The first, called

the NameNode, functions as the master or controller for the entire cluster.

It's responsible for maintaining all the HDFS metadata and drives the entire

file system namespace operation. There can be only one single NameNode

per cluster, and if it is lost or fails, all the data in the HDFS cluster is gone.

The second type of role within an HDFS cluster is the DataNode. And

although there is only one NameNode, there are usually many DataNodes.

These nodes primarily interact with HDFS clients by taking on the

responsibility to read and write or store data or data blocks. This makes

scaling in your cluster easy, as you simply add additional DataNodes to your

cluster to increase capacity. The DataNode is also responsible for replicating

data out when instructed to do so by the NameNode (more on HDFS

replication shortly).

HDFS Read and Write Operations

To get a better understanding of how these parts or pieces fit together,

Figure 4.1 and Figure 4.2 illustrate how a client reads from and writes to an

HDFS cluster.

Search WWH ::

Custom Search

Home