Setting Up a Hadoop Cluster - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Java

The location of the Java implementation to use is determined by the JAVA_HOME setting

in hadoop-env.sh or the JAVA_HOME shell environment variable, if not set in hadoop-

env.sh . It's a good idea to set the value in hadoop-env.sh , so that it is clearly defined in

one place and to ensure that the whole cluster is using the same version of Java.

Memory heap size

By default, Hadoop allocates 1,000 MB (1 GB) of memory to each daemon it runs. This is

controlled by the HADOOP_HEAPSIZE setting in hadoop-env.sh . There are also environ-

ment variables to allow you to change the heap size for a single daemon. For example,

you can set YARN_RESOURCEMANAGER_HEAPSIZE in yarn-env.sh to override the

heap size for the resource manager.

Surprisingly, there are no corresponding environment variables for HDFS daemons, des-

pite it being very common to give the namenode more heap space. There is another way to

set the namenode heap size, however; this is discussed in the following sidebar.

HOW MUCH MEMORY DOES A NAMENODE NEED?

A namenode can eat up memory, since a reference to every block of every file is maintained in memory.

It's difficult to give a precise formula because memory usage depends on the number of blocks per file,

the filename length, and the number of directories in the filesystem; plus, it can change from one Hadoop

release to another.

The default of 1,000 MB of namenode memory is normally enough for a few million files, but as a rule

of thumb for sizing purposes, you can conservatively allow 1,000 MB per million blocks of storage.

For example, a 200-node cluster with 24 TB of disk space per node, a block size of 128 MB, and a rep-

lication factor of 3 has room for about 2 million blocks (or more): 200 × 24,000,000 MB ⁄ (128 MB × 3).

So in this case, setting the namenode memory to 12,000 MB would be a good starting point.

You can increase the namenode's memory without changing the memory allocated to other Hadoop dae-

mons by setting HADOOP_NAMENODE_OPTS in hadoop-env.sh to include a JVM option for setting the

memory size. HADOOP_NAMENODE_OPTS allows you to pass extra options to the namenode's JVM.

So, for example, if you were using a Sun JVM, -Xmx2000m would specify that 2,000 MB of memory

should be allocated to the namenode.

If you change the namenode's memory allocation, don't forget to do the same for the secondary namen-

ode (using the HADOOP_SECONDARYNAMENODE_OPTS variable), since its memory requirements are

comparable to the primary namenode's.

Search WWH ::

Custom Search

Home