Database Reference
In-Depth Information
Figure 3-4. A client writing data to HDFS
The client creates the file by calling
create()
on
DistributedFileSystem
(step
create a new file in the filesystem's namespace, with no blocks associated with it (step 2).
The namenode performs various checks to make sure the file doesn't already exist and
that the client has the right permissions to create the file. If these checks pass, the namen-
ode makes a record of the new file; otherwise, file creation fails and the client is thrown
an
IOException
. The
DistributedFileSystem
returns an
FSDataOut-
putStream
for the client to start writing data to. Just as in the read case,
FSDataOut-
putStream
wraps a
DFSOutputStream
, which handles communication with the
datanodes and namenode.
As the client writes data (step 3), the
DFSOutputStream
splits it into packets, which it
writes to an internal queue called the
data queue
. The data queue is consumed by the
DataStreamer
, which is responsible for asking the namenode to allocate new blocks
by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a
pipeline, and here we'll assume the replication level is three, so there are three nodes in