Database Reference
In-Depth Information
</property>
</configuration>
HDFS
To run HDFS, you need to designate one machine as a namenode. In this case, the prop-
erty fs.defaultFS is an HDFS filesystem URI whose host is the namenode's host-
name or IP address and whose port is the port that the namenode will listen on for RPCs.
If no port is specified, the default of 8020 is used.
The fs.defaultFS property also doubles as specifying the default filesystem. The de-
fault filesystem is used to resolve relative paths, which are handy to use because they save
typing (and avoid hardcoding knowledge of a particular namenode's address). For ex-
ample, with the default filesystem defined in Example 10-1 , the relative URI /a/b is re-
solved to hdfs://namenode/a/b .
NOTE
If you are running HDFS, the fact that fs.defaultFS is used to specify both the HDFS namenode
and the default filesystem means HDFS has to be the default filesystem in the server configuration. Bear
in mind, however, that it is possible to specify a different filesystem as the default in the client configura-
tion, for convenience.
For example, if you use both HDFS and S3 filesystems, then you have a choice of specifying either as
the default in the client configuration, which allows you to refer to the default with a relative URI and
the other with an absolute URI.
There are a few other configuration properties you should set for HDFS: those that set the
storage directories for the namenode and for datanodes. The property
dfs.namenode.name.dir specifies a list of directories where the namenode stores
persistent filesystem metadata (the edit log and the filesystem image). A copy of each
metadata file is stored in each directory for redundancy. It's common to configure
dfs.namenode.name.dir so that the namenode metadata is written to one or two
local disks, as well as a remote disk, such as an NFS-mounted directory. Such a setup
guards against failure of a local disk and failure of the entire namenode, since in both
cases the files can be recovered and used to start a new namenode. (The secondary na-
menode takes only periodic checkpoints of the namenode, so it does not provide an up-to-
date backup of the namenode.)
You should also set the dfs.datanode.data.dir property, which specifies a list of
directories for a datanode to store its blocks in. Unlike the namenode, which uses multiple
directories for redundancy, a datanode round-robins writes between its storage directories,
Search WWH ::




Custom Search