Database Reference
In-Depth Information
Java
The location of the Java implementation to use is determined by the
JAVA_HOME
setting
in
hadoop-env.sh
or the
JAVA_HOME
shell environment variable, if not set in
hadoop-
env.sh
. It's a good idea to set the value in
hadoop-env.sh
, so that it is clearly defined in
one place and to ensure that the whole cluster is using the same version of Java.
Memory heap size
By default, Hadoop allocates 1,000 MB (1 GB) of memory to each daemon it runs. This is
controlled by the
HADOOP_HEAPSIZE
setting in
hadoop-env.sh
. There are also environ-
ment variables to allow you to change the heap size for a single daemon. For example,
you can set
YARN_RESOURCEMANAGER_HEAPSIZE
in
yarn-env.sh
to override the
heap size for the resource manager.
Surprisingly, there are no corresponding environment variables for HDFS daemons, des-
pite it being very common to give the namenode more heap space. There is another way to
set the namenode heap size, however; this is discussed in the following sidebar.
HOW MUCH MEMORY DOES A NAMENODE NEED?
A namenode can eat up memory, since a reference to every block of every file is maintained in memory.
It's difficult to give a precise formula because memory usage depends on the number of blocks per file,
the filename length, and the number of directories in the filesystem; plus, it can change from one Hadoop
release to another.
The default of 1,000 MB of namenode memory is normally enough for a few million files, but as a rule
of thumb for sizing purposes, you can conservatively allow 1,000 MB per million blocks of storage.
For example, a 200-node cluster with 24 TB of disk space per node, a block size of 128 MB, and a rep-
lication factor of 3 has room for about 2 million blocks (or more): 200 × 24,000,000 MB ⁄ (128 MB × 3).
So in this case, setting the namenode memory to 12,000 MB would be a good starting point.
You can increase the namenode's memory without changing the memory allocated to other Hadoop dae-
mons by setting
HADOOP_NAMENODE_OPTS
in
hadoop-env.sh
to include a JVM option for setting the
memory size.
HADOOP_NAMENODE_OPTS
allows you to pass extra options to the namenode's JVM.
So, for example, if you were using a Sun JVM,
-Xmx2000m
would specify that 2,000 MB of memory
should be allocated to the namenode.
If you change the namenode's memory allocation, don't forget to do the same for the secondary namen-
ode (using the
HADOOP_SECONDARYNAMENODE_OPTS
variable), since its memory requirements are
comparable to the primary namenode's.