Starting Hadoop - Hadoop in Action

Databases Reference

In-Depth Information

2.1.4

JobTracker

The JobTracker

daemon is the liaison between your application and Hadoop. Once

you submit your code to your cluster, the JobTracker determines the execution plan

by determining which files to process, assigns nodes to different tasks, and monitors all

tasks as they're running. Should a task fail, the JobTracker will automatically relaunch

the task, possibly on a different node, up to a predefined limit of retries.

There is only one JobTracker daemon per Hadoop cluster. It's typically run on a

server as a master node of the cluster.

2.1.5

TaskTracker

As with the storage daemons, the computing daemons also follow a master/slave archi-

tecture: the JobTracker is the master overseeing the overall execution of a MapReduce

job and the TaskTrackers

manage the execution of individual tasks on each slave node.

Figure 2.2 illustrates this interaction.

Each TaskTracker is responsible for executing the individual tasks that the JobTracker

assigns. Although there is a single TaskTracker per slave node, each TaskTracker can

spawn multiple JVMs

to handle many map or reduce tasks in parallel.

One responsibility of the TaskTracker is to constantly communicate with the

JobTracker. If the JobTracker fails to receive a heartbeat from a TaskTracker within a

specified amount of time, it will assume the TaskTracker has crashed and will resubmit

the corresponding tasks to other nodes in the cluster.

Client

JobTracker

TaskTracker

Map

Reduce

Figure 2.2 JobTracker

interaction. After a client calls the

JobTracker to begin a data processing job, the JobTracker partitions the work

and assigns different map and reduce tasks to each TaskTracker in the cluster.

and TaskTracker

Search WWH ::

Custom Search

Home