Database Reference
In-Depth Information
hadoop dfsadmin -report
When you need to manipulate the available nodes for maintenance, a useful
command is refreshNodes , which forces the name node to reread the list
of available DataNodes and any exclusions:
hadoop dfsadmin -refreshNodes
Generally, HDFS will correct any file system problems that it encounters,
assuming that the problem is correctable. In some cases, though, you might
want to explicitly check for errors. In that case, you can run the fsck
command. This checks the HDFS file system and reports any errors back to
the user. You can also use the fsck command to move corrupted files to a
specific folder or to delete them. This command runs the file system check
on the /user directory:
hadoop fsck /user
Overall, HDFS is designed to minimize the amount of administrative
overhead involved. This section has focused on the core pieces of
administrative information to provide you with enough information to get
up and running without overwhelming you. For more details on
administering it, you may want to review the document at
http://hadoop.apache.org/docs/stable/
hadoop-project-dist/hadoop-hdfs/HdfsDesign.html .
Moving and Organizing Data in HDFS
HDFS manages the data stored in the Hadoop cluster without any necessary
user intervention. In fact, a good portion of the design strategies used for
HDFS were adopted to support that goal: a system that minimizes the
amount of administration that you need to be concerned with. If you will
be working with small clusters, or data on the smaller end of big data, you
can safely skip this section. However, there are still scenarios in Hadoop
whereyoucangetbetterperformanceandscalabilitybytakingamoredirect
approach, as this section covers.
Search WWH ::




Custom Search