Database Reference
In-Depth Information
YARN
To run YARN, you need to designate one machine as a resource manager. The simplest
way to do this is to set the property yarn.resourcemanager.hostname to the
hostname or IP address of the machine running the resource manager. Many of the re-
source manager's server addresses are derived from this property. For example,
yarn.resourcemanager.address takes the form of a host-port pair, and the host
defaults to yarn.resourcemanager.hostname . In a MapReduce client configura-
tion, this property is used to connect to the resource manager over RPC.
During a MapReduce job, intermediate data and working files are written to temporary
local files. Because this data includes the potentially very large output of map tasks, you
need to ensure that the yarn.nodemanager.local-dirs property, which controls
the location of local temporary storage for YARN containers, is configured to use disk
partitions that are large enough. The property takes a comma-separated list of directory
names, and you should use all available local disks to spread disk I/O (the directories are
used in round-robin fashion). Typically, you will use the same disks and partitions (but
different directories) for YARN local storage as you use for datanode block storage, as
governed by the dfs.datanode.data.dir property, which was discussed earlier.
Unlike MapReduce 1, YARN doesn't have tasktrackers to serve map outputs to reduce
tasks, so for this function it relies on shuffle handlers, which are long-running auxiliary
services running in node managers. Because YARN is a general-purpose service, the
MapReduce shuffle handlers need to be enabled explicitly in yarn-site.xml by setting the
yarn.nodemanager.aux-services property to mapreduce_shuffle .
Table 10-3 summarizes the important configuration properties for YARN. The resource-
related settings are covered in more detail in the next sections.
Search WWH ::




Custom Search