Database Reference
In-Depth Information
YARN was designed to address many of the limitations in MapReduce 1. The benefits to
using YARN include the following:
Scalability
YARN can run on larger clusters than MapReduce 1. MapReduce 1 hits scalability bot-
tlenecks in the region of 4,000 nodes and 40,000 tasks, [ 40 ] stemming from the fact that
the jobtracker has to manage both jobs and tasks. YARN overcomes these limitations
by virtue of its split resource manager/application master architecture: it is designed to
scale up to 10,000 nodes and 100,000 tasks.
In contrast to the jobtracker, each instance of an application — here, a MapReduce job
— has a dedicated application master, which runs for the duration of the application.
This model is actually closer to the original Google MapReduce paper, which describes
how a master process is started to coordinate map and reduce tasks running on a set of
workers.
Availability
High availability (HA) is usually achieved by replicating the state needed for another
daemon to take over the work needed to provide the service, in the event of the service
daemon failing. However, the large amount of rapidly changing complex state in the
jobtracker's memory (each task status is updated every few seconds, for example)
makes it very difficult to retrofit HA into the jobtracker service.
With the jobtracker's responsibilities split between the resource manager and applica-
tion master in YARN, making the service highly available became a divide-and-con-
quer problem: provide HA for the resource manager, then for YARN applications (on a
per-application basis). And indeed, Hadoop 2 supports HA both for the resource man-
ager and for the application master for MapReduce jobs. Failure recovery in YARN is
discussed in more detail in Failures .
Utilization
Search WWH ::




Custom Search