Database Reference
In-Depth Information
The Command-Line Interface
We're going to have a look at HDFS by interacting with it from the command line. There
are many other interfaces to HDFS, but the command line is one of the simplest and, to
many developers, the most familiar.
We are going to run HDFS on one machine, so first follow the instructions for setting up
Hadoop in pseudodistributed mode in Appendix A . Later we'll see how to run HDFS on a
cluster of machines to give us scalability and fault tolerance.
There are two properties that we set in the pseudodistributed configuration that deserve fur-
ther explanation. The first is fs.defaultFS , set to hdfs://localhost/ , which is
used to set a default filesystem for Hadoop. [ 29 ] Filesystems are specified by a URI, and
here we have used an hdfs URI to configure Hadoop to use HDFS by default. The HDFS
daemons will use this property to determine the host and port for the HDFS namenode.
We'll be running it on localhost, on the default HDFS port, 8020. And HDFS clients will
use this property to work out where the namenode is running so they can connect to it.
We set the second property, dfs.replication , to 1 so that HDFS doesn't replicate
filesystem blocks by the default factor of three. When running with a single datanode,
HDFS can't replicate blocks to three datanodes, so it would perpetually warn about blocks
being under-replicated. This setting solves that problem.
Basic Filesystem Operations
The filesystem is ready to be used, and we can do all of the usual filesystem operations,
such as reading files, creating directories, moving files, deleting data, and listing director-
ies. You can type hadoop fs -help to get detailed help on every command.
Start by copying a file from the local filesystem to HDFS:
% hadoop fs -copyFromLocal input/docs/quangle.txt \
hdfs://localhost/user/tom/quangle.txt
This command invokes Hadoop's filesystem shell command fs , which supports a number
of subcommands — in this case, we are running -copyFromLocal . The local file
quangle.txt is copied to the file /user/tom/quangle.txt on the HDFS instance running on loc-
alhost. In fact, we could have omitted the scheme and host of the URI and picked up the
default, hdfs://localhost , as specified in core-site.xml :
% hadoop fs -copyFromLocal input/docs/quangle.txt /user/tom/
quangle.txt
Search WWH ::




Custom Search