Database Reference
In-Depth Information
Anatomy of a MapReduce Job Run
You can run a MapReduce job with a single method call:
submit()
on a
Job
object (you
can also call
waitForCompletion()
, which submits the job if it hasn't been submitted
behind the scenes. This section uncovers the steps Hadoop takes to run a job.
The whole process is illustrated in
Figure 7-1
. At the highest level, there are five independ-
ent entities:
[
52
]
▪ The client, which submits the MapReduce job.
▪ The YARN resource manager, which coordinates the allocation of compute re-
sources on the cluster.
▪ The YARN node managers, which launch and monitor the compute containers on
machines in the cluster.
▪ The MapReduce application master, which coordinates the tasks running the
MapReduce job. The application master and the MapReduce tasks run in contain-
ers that are scheduled by the resource manager and managed by the node man-
agers.
▪ The distributed filesystem (normally HDFS, covered in
Chapter 3
), which is used
for sharing job files between the other entities.