Database Reference
In-Depth Information
cessfully, the job counters are displayed. Otherwise, the error that caused the job to fail is
logged to the console.
The job submission process implemented by JobSubmitter does the following:
▪ Asks the resource manager for a new application ID, used for the MapReduce job
ID (step 2).
▪ Checks the output specification of the job. For example, if the output directory
has not been specified or it already exists, the job is not submitted and an error is
thrown to the MapReduce program.
▪ Computes the input splits for the job. If the splits cannot be computed (because
the input paths don't exist, for example), the job is not submitted and an error is
thrown to the MapReduce program.
▪ Copies the resources needed to run the job, including the job JAR file, the config-
uration file, and the computed input splits, to the shared filesystem in a directory
named after the job ID (step 3). The job JAR is copied with a high replication
factor (controlled by the mapre-
duce.client.submit.file.replication property, which defaults to
10) so that there are lots of copies across the cluster for the node managers to ac-
cess when they run tasks for the job.
▪ Submits the job by calling submitApplication() on the resource manager
(step 4).
Job Initialization
When the resource manager receives a call to its submitApplication() method, it
hands off the request to the YARN scheduler. The scheduler allocates a container, and the
resource manager then launches the application master's process there, under the node
manager's management (steps 5a and 5b).
The application master for MapReduce jobs is a Java application whose main class is
MRAppMaster . It initializes the job by creating a number of bookkeeping objects to
keep track of the job's progress, as it will receive progress and completion reports from
the tasks (step 6). Next, it retrieves the input splits computed in the client from the shared
filesystem (step 7). It then creates a map task object for each split, as well as a number of
reduce task objects determined by the mapreduce.job.reduces property (set by the
setNumReduceTasks() method on Job ). Tasks are given IDs at this point.
The application master must decide how to run the tasks that make up the MapReduce
job. If the job is small, the application master may choose to run the tasks in the same
JVM as itself. This happens when it judges that the overhead of allocating and running
Search WWH ::




Custom Search