Database Reference
In-Depth Information
Reading files in HDFS
The following steps occur when a client tries to read a file in HDFS:
1. The client contacts the namenode daemon to get the location of the data blocks of
the file it wants to read.
2. The namenode daemon returns the list of addresses of the datanodes for the data
blocks.
3. For any read operation, HDFS tries to return the node with the data block that is
closest to the client. Here, closest refers to network proximity between the datan-
ode daemon and the client.
4. Once the client has the list, it connects the closest datanode daemon and starts
reading the data block using a stream.
5. After the block is read completely, the connection to datanode is terminated and
the datanode daemon that hosts the next block in the sequence is identified and the
data block is streamed. This goes on until the last data block for that file is read.
The following diagram shows the read operation of a file in HDFS:
Search WWH ::




Custom Search