Database Reference
In-Depth Information
Configuring Hadoop
Hadoop must have its configuration set appropriately to run in distributed mode on a
cluster. The important configuration settings to achieve this are discussed in Hadoop Con-
figuration .
Formatting the HDFS Filesystem
Before it can be used, a brand-new HDFS installation needs to be formatted. The format-
ting process creates an empty filesystem by creating the storage directories and the initial
versions of the namenode's persistent data structures. Datanodes are not involved in the
initial formatting process, since the namenode manages all of the filesystem's metadata,
and datanodes can join or leave the cluster dynamically. For the same reason, you don't
need to say how large a filesystem to create, since this is determined by the number of
datanodes in the cluster, which can be increased as needed, long after the filesystem is
formatted.
Formatting HDFS is a fast operation. Run the following command as the hdfs user:
% hdfs namenode -format
Starting and Stopping the Daemons
Hadoop comes with scripts for running commands and starting and stopping daemons
across the whole cluster. To use these scripts (which can be found in the sbin directory),
you need to tell Hadoop which machines are in the cluster. There is a file for this purpose,
called slaves , which contains a list of the machine hostnames or IP addresses, one per line.
The slaves file lists the machines that the datanodes and node managers should run on. It
resides in Hadoop's configuration directory, although it may be placed elsewhere (and
given another name) by changing the HADOOP_SLAVES setting in hadoop-env.sh . Also,
this file does not need to be distributed to worker nodes, since they are used only by the
control scripts running on the namenode or resource manager.
The HDFS daemons are started by running the following command as the hdfs user:
% start-dfs.sh
The machine (or machines) that the namenode and secondary namenode run on is determ-
ined by interrogating the Hadoop configuration for their hostnames. For example, the
script finds the namenode's hostname by executing the following:
% hdfs getconf -namenodes
Search WWH ::




Custom Search