Database Reference
In-Depth Information
Figure 3-4. A client writing data to HDFS
The client creates the file by calling create() on DistributedFileSystem (step
1 in Figure 3-4 ). DistributedFileSystem makes an RPC call to the namenode to
create a new file in the filesystem's namespace, with no blocks associated with it (step 2).
The namenode performs various checks to make sure the file doesn't already exist and
that the client has the right permissions to create the file. If these checks pass, the namen-
ode makes a record of the new file; otherwise, file creation fails and the client is thrown
an IOException . The DistributedFileSystem returns an FSDataOut-
putStream for the client to start writing data to. Just as in the read case, FSDataOut-
putStream wraps a DFSOutputStream , which handles communication with the
datanodes and namenode.
As the client writes data (step 3), the DFSOutputStream splits it into packets, which it
writes to an internal queue called the data queue . The data queue is consumed by the
DataStreamer , which is responsible for asking the namenode to allocate new blocks
by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a
pipeline, and here we'll assume the replication level is three, so there are three nodes in
Search WWH ::




Custom Search