Storage Provisioning and Networking - Deploying and Managing a Cloud Infrastructure

Information Technology Reference

In-Depth Information

EXERCISE 9.1 (continued)

$ hadoop dfs -rmr -skipTrash /user/myuser/docs/

To read a text file, run the following commands (you can avoid spilling text on the termi-

nal by using a pipe):

$ hadoop dfs -cat /user/myuser/docs/resume.txt

$ hadoop dfs -cat /user/myuser/docs/resume.txt | less

To read compressed (such as Zip) or encoded files (such as TextRecordInputStream ):

$ hadoop dfs -text /user/myuser/docs/compressed_report.zip

$ hadoop dfs -text /user/myuser/docs/compressed_report.zip | less

EXERCISE 9.2

Killing a Hadoop Job and Avoiding Zombie Processes

To kill a Hadoop job, the user needs the job ID. The job ID is printed when a Hadoop job

starts executing. Another, more-formal method is to use the Hadoop Web interface, also

known as the job tracker (for a single-node setup, accessible at http://localhost:50030 ).

The job tracker displays information about running jobs, retired or finished jobs, and killed

or failed jobs. To kill a job and avoid zombie processes, do the following:

$ hadoop job -kill <job-id>

EXERCISE 9.3

Resolving a Common IOException with HDFS

A common Java IOException can occur when the nodes are started or during the execu-

tion of a job. This happens due to HDFS's .Trash directory being full. To resolve this issue,

clear the HDFS .Trash directory and restart the cluster. Remember that this has to be done

through the namenode terminal because the namenode is the master node.

$ hadoop dfs -rmr /user/myuser/.Trash/*

$ /bin/hadoop-install-path/bin/stop-all.sh

$ /bin/hadoop-install-path/bin/start-all.sh

To check if the nodes (NameNode and DataNodes) have started, do the following on the

namenode terminal:

$ jps

Search WWH ::

Custom Search

Home