HDFS, Hive, HBase, and HCatalog - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 4.2 HDFS write operation

When a client needs to read a data set from HDFS (see Figure 4.1 ), it

must first contact the NameNode. The NameNode contains all the metadata

associated with the data set or file, including its blocks and all the block

storage locations. The NameNode passes this metadata or information back

to the client. The client then subsequently requests the data directly for each

DataNode.

The pattern of this interaction is important. Although the NameNode acts

as a gatekeeper, after the metadata is provided to the client, it gets out of

the way and allows the client to interact directly with the DataNode. This

provides the foundation for the extremely high level of throughput in HDFS.

Much like the read operation, an HDFS write-operation begins with the

NameNode (see Figure 4.2 ). First, the client requests that the NameNode

create a new empty file with no blocks associated with it. Next, the client

streams data directly to the DataNode using an internal queue to reliably

manage the process of sending and acknowledging the data.

Search WWH ::

Custom Search

Home