Database Reference
In-Depth Information
If you specify the listblocks parameter, http:// datanode :50075/blockScannerRe-
port?listblocks , the report is preceded by a list of all the blocks on the datanode along with
their latest verification status. Here is a snippet of the block list (lines are split to fit the
page):
blk_6035596358209321442 : status : ok type : none scan time
:
0 not yet verified
blk_3065580480714947643 : status : ok type : remote scan time
:
1215755306400 2008-07-11 05:48:26,400
blk_8729669677359108508 : status : ok type : local scan time
:
1215755727345 2008-07-11 05:55:27,345
The first column is the block ID, followed by some key-value pairs. The status can be one
of failed or ok , according to whether the last scan of the block detected a checksum er-
ror. The type of scan is local if it was performed by the background thread, remote if
it was performed by a client or a remote datanode, or none if a scan of this block has yet
to be made. The last piece of information is the scan time, which is displayed as the num-
ber of milliseconds since midnight on January 1, 1970, and also as a more readable value.
Balancer
Over time, the distribution of blocks across datanodes can become unbalanced. An unbal-
anced cluster can affect locality for MapReduce, and it puts a greater strain on the highly
utilized datanodes, so it's best avoided.
The balancer program is a Hadoop daemon that redistributes blocks by moving them from
overutilized datanodes to underutilized datanodes, while adhering to the block replica
placement policy that makes data loss unlikely by placing block replicas on different racks
(see Replica Placement ) . It moves blocks until the cluster is deemed to be balanced, which
means that the utilization of every datanode (ratio of used space on the node to total capa-
city of the node) differs from the utilization of the cluster (ratio of used space on the
cluster to total capacity of the cluster) by no more than a given threshold percentage. You
can start the balancer with:
% start-balancer.sh
The -threshold argument specifies the threshold percentage that defines what it
means for the cluster to be balanced. The flag is optional; if omitted, the threshold is 10%.
At any one time, only one balancer may be running on the cluster.
Search WWH ::




Custom Search