Database Reference
In-Depth Information
YARN Compared to MapReduce 1
The distributed implementation of MapReduce in the original version of Hadoop (version 1
and earlier) is sometimes referred to as “MapReduce 1” to distinguish it from MapReduce
2, the implementation that uses YARN (in Hadoop 2 and later).
NOTE
It's important to realize that the old and new MapReduce APIs are not the same thing as the MapReduce 1
and MapReduce 2 implementations. The APIs are user-facing client-side features and determine how you
write MapReduce programs (see Appendix D ), whereas the implementations are just different ways of
running MapReduce programs. All four combinations are supported: both the old and new MapReduce
APIs run on both MapReduce 1 and 2.
In MapReduce 1, there are two types of daemon that control the job execution process: a
jobtracker and one or more tasktrackers . The jobtracker coordinates all the jobs run on the
system by scheduling tasks to run on tasktrackers. Tasktrackers run tasks and send progress
reports to the jobtracker, which keeps a record of the overall progress of each job. If a task
fails, the jobtracker can reschedule it on a different tasktracker.
In MapReduce 1, the jobtracker takes care of both job scheduling (matching tasks with
tasktrackers) and task progress monitoring (keeping track of tasks, restarting failed or slow
tasks, and doing task bookkeeping, such as maintaining counter totals). By contrast, in
YARN these responsibilities are handled by separate entities: the resource manager and an
application master (one for each MapReduce job). The jobtracker is also responsible for
storing job history for completed jobs, although it is possible to run a job history server as a
separate daemon to take the load off the jobtracker. In YARN, the equivalent role is the
timeline server, which stores application history. [ 39 ]
The YARN equivalent of a tasktracker is a node manager. The mapping is summarized in
Table 4-1 .
Table 4-1. A comparison of MapReduce 1 and YARN components
MapReduce 1
YARN
Jobtracker
Resource manager, application master, timeline server
Tasktracker
Node manager
Slot
Container
Search WWH ::




Custom Search