Setting Up a Hadoop Cluster - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

When configuring memory parameters it's very useful to be able to monitor a task's actual

memory usage during a job run, and this is possible via MapReduce task counters. The

counters PHYSICAL_MEMORY_BYTES , VIRTUAL_MEMORY_BYTES , and

COMMITTED_HEAP_BYTES (described in Table 9-2 ) provide snapshot values of

memory usage and are therefore suitable for observation during the course of a task at-

tempt.

Hadoop also provides settings to control how much memory is used for MapReduce oper-

ations. These can be set on a per-job basis and are covered in Shuffle and Sort .

CPU settings in YARN and MapReduce

In addition to memory, YARN treats CPU usage as a managed resource, and applications

can request the number of cores they need. The number of cores that a node manager can

allocate to containers is controlled by the yarn.nodemanager.resource.cpu-

vcores property. It should be set to the total number of cores on the machine, minus a

core for each daemon process running on the machine (datanode, node manager, and any

other long-running processes).

MapReduce jobs can control the number of cores allocated to map and reduce containers

by setting mapreduce.map.cpu.vcores and mapre-

duce.reduce.cpu.vcores . Both default to 1, an appropriate setting for normal

single-threaded MapReduce tasks, which can only saturate a single core.

WARNING

While the number of cores is tracked during scheduling (so a container won't be allocated on a machine

where there are no spare cores, for example), the node manager will not, by default, limit actual CPU us-

age of running containers. This means that a container can abuse its allocation by using more CPU than

it was given, possibly starving other containers running on the same host. YARN has support for enfor-

cing CPU limits using Linux cgroups. The node manager's container executor class

( yarn.nodemanager.container-executor.class ) must be set to use the LinuxContain-

erExecutor class, which in turn must be configured to use cgroups (see the properties under

yarn.nodemanager.linux-container-executor ).

Hadoop Daemon Addresses and Ports

Hadoop daemons generally run both an RPC server for communication between daemons

( Table 10-5 ) and an HTTP server to provide web pages for human consumption

( Table 10-6 ). Each server is configured by setting the network address and port number to

Search WWH ::

Custom Search

Home