Database Reference
In-Depth Information
When configuring memory parameters it's very useful to be able to monitor a task's actual
memory usage during a job run, and this is possible via MapReduce task counters. The
counters PHYSICAL_MEMORY_BYTES , VIRTUAL_MEMORY_BYTES , and
COMMITTED_HEAP_BYTES (described in Table 9-2 ) provide snapshot values of
memory usage and are therefore suitable for observation during the course of a task at-
tempt.
Hadoop also provides settings to control how much memory is used for MapReduce oper-
ations. These can be set on a per-job basis and are covered in Shuffle and Sort .
CPU settings in YARN and MapReduce
In addition to memory, YARN treats CPU usage as a managed resource, and applications
can request the number of cores they need. The number of cores that a node manager can
allocate to containers is controlled by the yarn.nodemanager.resource.cpu-
vcores property. It should be set to the total number of cores on the machine, minus a
core for each daemon process running on the machine (datanode, node manager, and any
other long-running processes).
MapReduce jobs can control the number of cores allocated to map and reduce containers
by setting mapreduce.map.cpu.vcores and mapre-
duce.reduce.cpu.vcores . Both default to 1, an appropriate setting for normal
single-threaded MapReduce tasks, which can only saturate a single core.
WARNING
While the number of cores is tracked during scheduling (so a container won't be allocated on a machine
where there are no spare cores, for example), the node manager will not, by default, limit actual CPU us-
age of running containers. This means that a container can abuse its allocation by using more CPU than
it was given, possibly starving other containers running on the same host. YARN has support for enfor-
cing CPU limits using Linux cgroups. The node manager's container executor class
( yarn.nodemanager.container-executor.class ) must be set to use the LinuxContain-
erExecutor class, which in turn must be configured to use cgroups (see the properties under
yarn.nodemanager.linux-container-executor ).
Hadoop Daemon Addresses and Ports
Hadoop daemons generally run both an RPC server for communication between daemons
( Table 10-5 ) and an HTTP server to provide web pages for human consumption
( Table 10-6 ). Each server is configured by setting the network address and port number to
Search WWH ::




Custom Search