Databases Reference
In-Depth Information
2.1.4
JobTracker
The JobTracker
daemon is the liaison between your application and Hadoop. Once
you submit your code to your cluster, the JobTracker determines the execution plan
by determining which files to process, assigns nodes to different tasks, and monitors all
tasks as they're running. Should a task fail, the JobTracker will automatically relaunch
the task, possibly on a different node, up to a predefined limit of retries.
There is only one JobTracker daemon per Hadoop cluster. It's typically run on a
server as a master node of the cluster.
2.1.5
TaskTracker
As with the storage daemons, the computing daemons also follow a master/slave archi-
tecture: the JobTracker is the master overseeing the overall execution of a MapReduce
job and the TaskTrackers
manage the execution of individual tasks on each slave node.
Figure 2.2 illustrates this interaction.
Each TaskTracker is responsible for executing the individual tasks that the JobTracker
assigns. Although there is a single TaskTracker per slave node, each TaskTracker can
spawn multiple JVMs
to handle many map or reduce tasks in parallel.
One responsibility of the TaskTracker is to constantly communicate with the
JobTracker. If the JobTracker fails to receive a heartbeat from a TaskTracker within a
specified amount of time, it will assume the TaskTracker has crashed and will resubmit
the corresponding tasks to other nodes in the cluster.
Client
JobTracker
TaskTracker
TaskTracker
TaskTracker
TaskTracker
Map
Map
Map
Map
Reduce
Reduce
Reduce
Reduce
Figure 2.2 JobTracker
interaction. After a client calls the
JobTracker to begin a data processing job, the JobTracker partitions the work
and assigns different map and reduce tasks to each TaskTracker in the cluster.
and TaskTracker
 
Search WWH ::




Custom Search