How MapReduce Works - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Anatomy of a MapReduce Job Run

You can run a MapReduce job with a single method call: submit() on a Job object (you

can also call waitForCompletion() , which submits the job if it hasn't been submitted

already, then waits for it to finish). [ 51 ] This method call conceals a great deal of processing

behind the scenes. This section uncovers the steps Hadoop takes to run a job.

The whole process is illustrated in Figure 7-1 . At the highest level, there are five independ-

ent entities: [ 52 ]

▪ The client, which submits the MapReduce job.

▪ The YARN resource manager, which coordinates the allocation of compute re-

sources on the cluster.

▪ The YARN node managers, which launch and monitor the compute containers on

machines in the cluster.

▪ The MapReduce application master, which coordinates the tasks running the

MapReduce job. The application master and the MapReduce tasks run in contain-

ers that are scheduled by the resource manager and managed by the node man-

agers.

▪ The distributed filesystem (normally HDFS, covered in Chapter 3 ), which is used

for sharing job files between the other entities.

Search WWH ::

Custom Search

Home