Database Reference
In-Depth Information
Memory settings in YARN and MapReduce
YARN treats memory in a more fine-grained manner than the slot-based model used in
MapReduce 1. Rather than specifying a fixed maximum number of map and reduce slots
that may run on a node at once, YARN allows applications to request an arbitrary amount
of memory (within limits) for a task. In the YARN model, node managers allocate
memory from a pool, so the number of tasks that are running on a particular node depends
on the sum of their memory requirements, and not simply on a fixed number of slots.
The calculation for how much memory to dedicate to a node manager for running contain-
ers depends on the amount of physical memory on the machine. Each Hadoop daemon
uses 1,000 MB, so for a datanode and a node manager, the total is 2,000 MB. Set aside
enough for other processes that are running on the machine, and the remainder can be
dedicated to the node manager's containers by setting the configuration property
yarn.nodemanager.resource.memory-mb to the total allocation in MB. (The
default is 8,192 MB, which is normally too low for most setups.)
The next step is to determine how to set memory options for individual jobs. There are
two main controls: one for the size of the container allocated by YARN, and another for
the heap size of the Java process run in the container.
NOTE
The memory controls for MapReduce are all set by the client in the job configuration. The YARN set-
tings are cluster settings and cannot be modified by the client.
Container sizes are determined by mapreduce.map.memory.mb and mapre-
duce.reduce.memory.mb ; both default to 1,024 MB. These settings are used by the
application master when negotiating for resources in the cluster, and also by the node
manager, which runs and monitors the task containers. The heap size of the Java process
is set by mapred.child.java.opts , and defaults to 200 MB. You can also set the
Java options separately for map and reduce tasks (see Table 10-4 ).
Search WWH ::




Custom Search