Database Reference
In-Depth Information
REPLICA PLACEMENT
How does the namenode choose which datanodes to store replicas on? There's a trade-off between reli-
ability and write bandwidth and read bandwidth here. For example, placing all replicas on a single node
incurs the lowest write bandwidth penalty (since the replication pipeline runs on a single node), but this
offers no real redundancy (if the node fails, the data for that block is lost). Also, the read bandwidth is
high for off-rack reads. At the other extreme, placing replicas in different data centers may maximize re-
dundancy, but at the cost of bandwidth. Even in the same data center (which is what all Hadoop clusters
to date have run in), there are a variety of possible placement strategies.
Hadoop's default strategy is to place the first replica on the same node as the client (for clients running
outside the cluster, a node is chosen at random, although the system tries not to pick nodes that are too
full or too busy). The second replica is placed on a different rack from the first ( off-rack ), chosen at ran-
dom. The third replica is placed on the same rack as the second, but on a different node chosen at ran-
dom. Further replicas are placed on random nodes in the cluster, although the system tries to avoid pla-
cing too many replicas on the same rack.
Once the replica locations have been chosen, a pipeline is built, taking network topology into account.
For a replication factor of 3, the pipeline might look like Figure 3-5 .
Figure 3-5. A typical replica pipeline
Overall, this strategy gives a good balance among reliability (blocks are stored on two racks), write
bandwidth (writes only have to traverse a single network switch), read performance (there's a choice of
two racks to read from), and block distribution across the cluster (clients only write a single block on the
local rack).
Coherency Model
A coherency model for a filesystem describes the data visibility of reads and writes for a
file. HDFS trades off some POSIX requirements for performance, so some operations may
behave differently than you expect them to.
After creating a file, it is visible in the filesystem namespace, as expected:
Search WWH ::




Custom Search