Database Reference
In-Depth Information
In MapReduce 1, each tasktracker is configured with a static allocation of fixed-size
“slots,” which are divided into map slots and reduce slots at configuration time. A map
slot can only be used to run a map task, and a reduce slot can only be used for a reduce
task.
In YARN, a node manager manages a pool of resources, rather than a fixed number of
designated slots. MapReduce running on YARN will not hit the situation where a re-
duce task has to wait because only map slots are available on the cluster, which can
happen in MapReduce 1. If the resources to run the task are available, then the applica-
tion will be eligible for them.
Furthermore, resources in YARN are fine grained, so an application can make a request
for what it needs, rather than for an indivisible slot, which may be too big (which is
wasteful of resources) or too small (which may cause a failure) for the particular task.
Multitenancy
In some ways, the biggest benefit of YARN is that it opens up Hadoop to other types of
distributed application beyond MapReduce. MapReduce is just one YARN application
among many.
It is even possible for users to run different versions of MapReduce on the same YARN
cluster, which makes the process of upgrading MapReduce more manageable. (Note,
however, that some parts of MapReduce, such as the job history server and the shuffle
handler, as well as YARN itself, still need to be upgraded across the cluster.)
Since Hadoop 2 is widely used and is the latest stable version, in the rest of this topic the
term “MapReduce” refers to MapReduce 2 unless otherwise stated. Chapter 7 looks in de-
tail at how MapReduce running on YARN works.
Search WWH ::




Custom Search