Database Reference
In-Depth Information
The Command-Line Interface
We're going to have a look at HDFS by interacting with it from the command line. There
are many other interfaces to HDFS, but the command line is one of the simplest and, to
many developers, the most familiar.
We are going to run HDFS on one machine, so first follow the instructions for setting up
Hadoop in pseudodistributed mode in
Appendix A
. Later we'll see how to run HDFS on a
cluster of machines to give us scalability and fault tolerance.
There are two properties that we set in the pseudodistributed configuration that deserve fur-
ther explanation. The first is
fs.defaultFS
, set to
hdfs://localhost/
, which is
here we have used an
hdfs
URI to configure Hadoop to use HDFS by default. The HDFS
daemons will use this property to determine the host and port for the HDFS namenode.
We'll be running it on localhost, on the default HDFS port, 8020. And HDFS clients will
use this property to work out where the namenode is running so they can connect to it.
We set the second property,
dfs.replication
, to 1 so that HDFS doesn't replicate
filesystem blocks by the default factor of three. When running with a single datanode,
HDFS can't replicate blocks to three datanodes, so it would perpetually warn about blocks
being under-replicated. This setting solves that problem.
Basic Filesystem Operations
The filesystem is ready to be used, and we can do all of the usual filesystem operations,
such as reading files, creating directories, moving files, deleting data, and listing director-
ies. You can type
hadoop fs -help
to get detailed help on every command.
Start by copying a file from the local filesystem to HDFS:
%
hadoop fs -copyFromLocal input/docs/quangle.txt \
hdfs://localhost/user/tom/quangle.txt
This command invokes Hadoop's filesystem shell command
fs
, which supports a number
of subcommands — in this case, we are running
-copyFromLocal
. The local file
quangle.txt
is copied to the file
/user/tom/quangle.txt
on the HDFS instance running on loc-
alhost. In fact, we could have omitted the scheme and host of the URI and picked up the
default,
hdfs://localhost
, as specified in
core-site.xml
:
%
hadoop fs -copyFromLocal input/docs/quangle.txt /user/tom/
quangle.txt