Database Reference
In-Depth Information
WHICH PROPERTIES CAN I SET?
ConfigurationPrinter
is a useful tool for discovering what a property is set to in your environ-
ment. For a running daemon, like the namenode, you can see its configuration by viewing the
/conf
page
on its web server. (See
Table 10-6
to find port numbers.)
You can also see the default settings for all the public properties in Hadoop by looking in the
share/doc
directory of your Hadoop installation for files called
core-default.xml
,
hdfs-default.xml
,
yarn-default.xml
,
and
mapred-default.xml
. Each property has a description that explains what it is for and what values it
can be set to.
The default settings files' documentation can be found online at pages linked from
ht-
tp://hadoop.apache.org/docs/current/
(look for the “Configuration” heading in the navigation). You can
find the defaults for a particular Hadoop release by replacing
current
in the preceding URL with
r<ver-
sion>
— for example,
http://hadoop.apache.org/docs/r2.5.0/
.
Be aware that some properties have no effect when set in the client configuration. For example, if you set
yarn.nodemanager.resource.memory-mb
in your job submission with the expectation that it
would change the amount of memory available to the node managers running your job, you would be
disappointed, because this property is honored only if set in the node manager's
yarn-site.xml
file. In
general, you can tell the component where a property should be set by its name, so the fact that
yarn.nodemanager.resource.memory-mb
starts with
yarn.nodemanager
gives you a clue
that it can be set only for the node manager daemon. This is not a hard and fast rule, however, so in some
cases you may need to resort to trial and error, or even to reading the source.
Configuration property names have changed in Hadoop 2 onward, in order to give them a more regular
naming structure. For example, the HDFS properties pertaining to the namenode have been changed to
have a
dfs.namenode
prefix, so
dfs.name.dir
is now
dfs.namenode.name.dir
. Similarly,
MapReduce properties have the
mapreduce
prefix rather than the older
mapred
prefix, so
mapred.job.name
is now
mapreduce.job.name
.
This topic uses the new property names to avoid deprecation warnings. The old property names still
work, however, and they are often referred to in older documentation. You can find a table listing the de-
precated property names and their replacements on the
Hadoop website
.
We discuss many of Hadoop's most important configuration properties throughout this topic.
GenericOptionsParser
also allows you to set individual properties. For example:
%
hadoop ConfigurationPrinter -D color=yellow | grep color
color=yellow
Here, the
-D
option is used to set the configuration property with key
color
to the value
yellow
. Options specified with
-D
take priority over properties from the configuration
files. This is very useful because you can put defaults into configuration files and then
override them with the
-D
option as needed. A common example of this is setting the
number of reducers for a MapReduce job via
-D mapreduce.job.reduces=
n
. This