Database Reference
In-Depth Information
This says that the file
/user/tom/part-00007
is made up of one block and shows the datan-
odes where the block is located. The
fsck
options used are as follows:
▪ The
-files
option shows the line with the filename, size, number of blocks,
and its health (whether there are any missing blocks).
▪ The
-blocks
option shows information about each block in the file, one line per
block.
▪ The
-racks
option displays the rack location and the datanode addresses for
each block.
Running
hdfs fsck
without any arguments displays full usage instructions.
Datanode block scanner
Every datanode runs a
block scanner
, which periodically verifies all the blocks stored on
the datanode. This allows bad blocks to be detected and fixed before they are read by cli-
ents. The scanner maintains a list of blocks to verify and scans them one by one for check-
sum errors. It employs a throttling mechanism to preserve disk bandwidth on the datan-
ode.
Blocks are verified every three weeks to guard against disk errors over time (this period is
controlled by the
dfs.datanode.scan.period.hours
property, which defaults to
504 hours). Corrupt blocks are reported to the namenode to be fixed.
You can get a block verification report for a datanode by visiting the datanode's web inter-
face at
http://
datanode
:50075/blockScannerReport
. Here's an example of a report,
which should be self-explanatory:
Total Blocks : 21131
Verified in last hour : 70
Verified in last day : 1767
Verified in last week : 7360
Verified in last four weeks : 20057
Verified in SCAN_PERIOD : 20057
Not yet verified : 1074
Verified since restart : 35912
Scans since restart : 6541
Scan errors since restart : 0
Transient scan errors : 0
Current scan rate limit KBps : 1024
Progress this period : 109%
Time left in cur period : 53.08%