Databases Reference
In-Depth Information
configuration. For production clusters, the replication factor is typically 3 but can be
any positive integer. Replication factor is not applicable to directories, so they will only
show a dash (-) for that column.
After you've put data into HDFS, you can run Hadoop programs to process it. The
output of the processing will be a new set of files in HDFS, and you'll want to read or
retrieve the results.
RETRIEVING FILES
The Hadoop command
get
does the exact reverse of
put
. It copies files from HDFS to
the local filesystem. Let's say we no longer have the example.txt file locally and we want
to retrieve it from HDFS; we can run the command
hadoop fs -get example.txt .
to copy it into our local current working directory.
Another way to access the data is to display it. The Hadoop
cat
command allows us
to do that.
hadoop fs -cat example.txt
We can use the Hadoop file command with Unix pipes to send its output for further
processing by other Unix commands. For example, if the file is huge (as typical Hadoop
files are) and you're interested in a quick check of its content, you can pipe the output
of Hadoop's
cat
into a Unix
head
.
hadoop fs -cat example.txt | head
Hadoop natively supports a
tail
command for looking at the last kilobyte of a file.
hadoop fs -tail example.txt
After you finish working with files in HDFS, you may want to delete them to free up
space.
DELETING FILES
You shouldn't be too surprised by now that the Hadoop command for removing files
is
rm
.
hadoop fs -rm example.txt
The
rm
command can also be used to delete empty directories.
LOOKING UP HELP
A list of Hadoop file commands, together with the usage and description of each com-
mand, is given in the appendix. For the most part, the commands are modeled after
their Unix equivalent. You can execute
hadoop fs
(with no parameters) to get a com-
plete list of all commands available on your version of Hadoop. You can also use
help
to display the usage and a short description of each command. For example, to get a
summary of
ls
, execute
hadoop fs -help ls