Components of Hadoop - Hadoop in Action

Databases Reference

In-Depth Information

configuration. For production clusters, the replication factor is typically 3 but can be

any positive integer. Replication factor is not applicable to directories, so they will only

show a dash (-) for that column.

After you've put data into HDFS, you can run Hadoop programs to process it. The

output of the processing will be a new set of files in HDFS, and you'll want to read or

retrieve the results.

RETRIEVING FILES

The Hadoop command get does the exact reverse of put . It copies files from HDFS to

the local filesystem. Let's say we no longer have the example.txt file locally and we want

to retrieve it from HDFS; we can run the command

hadoop fs -get example.txt .

to copy it into our local current working directory.

Another way to access the data is to display it. The Hadoop cat command allows us

to do that.

hadoop fs -cat example.txt

We can use the Hadoop file command with Unix pipes to send its output for further

processing by other Unix commands. For example, if the file is huge (as typical Hadoop

files are) and you're interested in a quick check of its content, you can pipe the output

of Hadoop's cat into a Unix head .

hadoop fs -cat example.txt | head

Hadoop natively supports a tail command for looking at the last kilobyte of a file.

hadoop fs -tail example.txt

After you finish working with files in HDFS, you may want to delete them to free up

space.

DELETING FILES

You shouldn't be too surprised by now that the Hadoop command for removing files

is rm .

hadoop fs -rm example.txt

The rm command can also be used to delete empty directories.

LOOKING UP HELP

A list of Hadoop file commands, together with the usage and description of each com-

mand, is given in the appendix. For the most part, the commands are modeled after

their Unix equivalent. You can execute hadoop fs (with no parameters) to get a com-

plete list of all commands available on your version of Hadoop. You can also use help

to display the usage and a short description of each command. For example, to get a

summary of ls , execute

hadoop fs -help ls

Search WWH ::

Custom Search

Home