Storing and Managing Data in HDFS - Microsoft Big Data Solutions

Database Reference

In-Depth Information

By default, HDFS replicates every file three times. However, the replication

level can also be specified per file. This can prove useful if you are storing

transient data or data that can be re-created easily. This type of data might

not be replicated at all, or only replicated once. The file is replicated at the

block level. Therefore, a single file may be (and likely is) made up of blocks

stored on multiple nodes. The replicas of these blocks may be stored on still

more nodes.

Replicas are created as the client writes the data to HDFS. The first

DataNode to receive the data gets it in small chunks. Each chunk is written

to the DataNode's local storage and then transferred to the next DataNode.

The receiving DataNode carries out the same process, forwarding the

processed chunks to the next DataNode. That process is repeated for each

chunk, for each DataNode, until the required number of replicas has been

created.Becauseanodecanbereceivingachunktoprocessatthesametime

that it is sending another chunk to the next node, the process is said to be

pipelined.

A key aspect of the data replication capabilities in HDFS is that the replica

placement is optimized and is continuing to be improved. The replication

process is rack aware; that is, it understands how the computers are

physically organized. For data centers with large numbers of computers, it

is common to use network racks to hold the computers. Often, each rack

has its own network switch to handle network communication between

computers in the rack. This switch would then be connected to another

switch, which is connected to other network racks. This means that

communications between computers in the same rack is generally going to

be faster than communications between computers in different racks.

HDFS uses its rack awareness to optimize the placement of replicas within a

cluster. In doing so, it balances the need for performance with the need for

availability in the case of a hardware failure. In the common scenario, with

threereplicas,onereplicaisstoredonanodeinthelocalrack.Theothertwo

replicas will be stored in a remote rack, on two different nodes in the rack.

This approach still delivers good read performance, because a client reading

the file can access two unique racks (with their own network connection)

for the contents of the file. It also delivers good write performance, because

writing a replica to a node in the same rack is significantly faster than

writing it to a node in a different rack. This also balances availability; the

replicas are located in two separate racks and three nodes. Rack failures are

Search WWH ::

Custom Search

Home