Database Reference
In-Depth Information
Writing files in HDFS
The following sequence of steps occur when a client tries to write a file to HDFS:
1. The client informs the namenode daemon that it wants to write a file. The namen-
ode daemon checks to see whether the file already exists.
2. If it exists, an appropriate message is sent back to the client. If it does not exist, the
namenode daemon makes a metadata entry for the new file.
3. The file to be written is split into data packets at the client end and a data queue is
built. The packets in the queue are then streamed to the datanodes in the cluster.
4. The list of datanodes is given by the namenode daemon, which is prepared based
on the data replication factor configured. A pipeline is built to perform the writes
to all datanodes provided by the namenode daemon.
5. The first packet from the data queue is then transferred to the first datanode dae-
mon. The block is stored on the first datanode daemon and is then copied to the
next datanode daemon in the pipeline. This process goes on till the packet is writ-
ten to the last datanode daemon in the pipeline.
6. The sequence is repeated for all the packets in the data queue. For every packet
written on the datanode daemon, a corresponding acknowledgement is sent back to
the client.
7. If a packet fails to write onto one of the datanodes, the datanode daemon is re-
moved from the pipeline and the remainder of the packets is written to the good
datanodes. The namenode daemon notices the under-replication of the block and
arranges for another datanode daemon where the block could be replicated.
8. After all the packets are written, the client performs a close action, indicating that
the packets in the data queue have been completely transferred.
9. The client informs the namenode daemon that the write operation is now complete.
The following diagram shows the data block replication process across the datanodes dur-
ing a write operation in HDFS:
Search WWH ::




Custom Search