Database Reference
In-Depth Information
Data Flow
Anatomy of a File Read
To get an idea of how data flows between the client interacting with HDFS, the namenode,
and the datanodes, consider Figure 3-2 , which shows the main sequence of events when
reading a file.
Figure 3-2. A client reading data from HDFS
The client opens the file it wishes to read by calling open() on the FileSystem object,
which for HDFS is an instance of DistributedFileSystem (step 1 in Figure 3-2 ).
DistributedFileSystem calls the namenode, using remote procedure calls (RPCs),
to determine the locations of the first few blocks in the file (step 2). For each block, the na-
menode returns the addresses of the datanodes that have a copy of that block. Furthermore,
the datanodes are sorted according to their proximity to the client (according to the topo-
logy of the cluster's network; see Network Topology and Hadoop ). If the client is itself a
datanode (in the case of a MapReduce task, for instance), the client will read from the local
datanode if that datanode hosts a copy of the block (see also Figure 2-2 and Short-circuit
local reads ) .
Search WWH ::




Custom Search