Storing and Managing Data in HDFS - Microsoft Big Data Solutions

Database Reference

In-Depth Information

hadoop dfsadmin -report

When you need to manipulate the available nodes for maintenance, a useful

command is refreshNodes , which forces the name node to reread the list

of available DataNodes and any exclusions:

hadoop dfsadmin -refreshNodes

Generally, HDFS will correct any file system problems that it encounters,

assuming that the problem is correctable. In some cases, though, you might

want to explicitly check for errors. In that case, you can run the fsck

command. This checks the HDFS file system and reports any errors back to

the user. You can also use the fsck command to move corrupted files to a

specific folder or to delete them. This command runs the file system check

on the /user directory:

hadoop fsck /user

Overall, HDFS is designed to minimize the amount of administrative

overhead involved. This section has focused on the core pieces of

administrative information to provide you with enough information to get

up and running without overwhelming you. For more details on

administering it, you may want to review the document at

Moving and Organizing Data in HDFS

HDFS manages the data stored in the Hadoop cluster without any necessary

user intervention. In fact, a good portion of the design strategies used for

HDFS were adopted to support that goal: a system that minimizes the

amount of administration that you need to be concerned with. If you will

be working with small clusters, or data on the smaller end of big data, you

can safely skip this section. However, there are still scenarios in Hadoop

whereyoucangetbetterperformanceandscalabilitybytakingamoredirect

approach, as this section covers.

Search WWH ::

Custom Search

Home