Database Reference
In-Depth Information
When configuring memory parameters it's very useful to be able to monitor a task's actual
memory usage during a job run, and this is possible via MapReduce task counters. The
counters
PHYSICAL_MEMORY_BYTES
,
VIRTUAL_MEMORY_BYTES
, and
COMMITTED_HEAP_BYTES
(described in
Table 9-2
) provide snapshot values of
memory usage and are therefore suitable for observation during the course of a task at-
tempt.
Hadoop also provides settings to control how much memory is used for MapReduce oper-
ations. These can be set on a per-job basis and are covered in
Shuffle and Sort
.
CPU settings in YARN and MapReduce
In addition to memory, YARN treats CPU usage as a managed resource, and applications
can request the number of cores they need. The number of cores that a node manager can
allocate to containers is controlled by the
yarn.nodemanager.resource.cpu-
vcores
property. It should be set to the total number of cores on the machine, minus a
core for each daemon process running on the machine (datanode, node manager, and any
other long-running processes).
MapReduce jobs can control the number of cores allocated to map and reduce containers
by setting
mapreduce.map.cpu.vcores
and
mapre-
duce.reduce.cpu.vcores
. Both default to 1, an appropriate setting for normal
single-threaded MapReduce tasks, which can only saturate a single core.
WARNING
While the number of cores is tracked during scheduling (so a container won't be allocated on a machine
where there are no spare cores, for example), the node manager will not, by default, limit actual CPU us-
age of running containers. This means that a container can abuse its allocation by using more CPU than
it was given, possibly starving other containers running on the same host. YARN has support for enfor-
cing CPU limits using Linux cgroups. The node manager's container executor class
(
yarn.nodemanager.container-executor.class
) must be set to use the
LinuxContain-
erExecutor
class, which in turn must be configured to use cgroups (see the properties under
yarn.nodemanager.linux-container-executor
).
Hadoop Daemon Addresses and Ports
Hadoop daemons generally run both an RPC server for communication between daemons
(
Table 10-5
) and an HTTP server to provide web pages for human consumption
(
Table 10-6
). Each server is configured by setting the network address and port number to