Database Reference
In-Depth Information
Anatomy of a MapReduce Job Run
You can run a MapReduce job with a single method call: submit() on a Job object (you
can also call waitForCompletion() , which submits the job if it hasn't been submitted
already, then waits for it to finish). [ 51 ] This method call conceals a great deal of processing
behind the scenes. This section uncovers the steps Hadoop takes to run a job.
The whole process is illustrated in Figure 7-1 . At the highest level, there are five independ-
ent entities: [ 52 ]
▪ The client, which submits the MapReduce job.
▪ The YARN resource manager, which coordinates the allocation of compute re-
sources on the cluster.
▪ The YARN node managers, which launch and monitor the compute containers on
machines in the cluster.
▪ The MapReduce application master, which coordinates the tasks running the
MapReduce job. The application master and the MapReduce tasks run in contain-
ers that are scheduled by the resource manager and managed by the node man-
agers.
▪ The distributed filesystem (normally HDFS, covered in Chapter 3 ), which is used
for sharing job files between the other entities.
Search WWH ::




Custom Search