Database Reference
In-Depth Information
The cluster-level Job Tracker handles client requests via a Map Reduce (MR) API. The clients need only process
via the MR API, as the Map Reduce framework and system handle the scheduling, resources, and failover in the event
of a crash. Job Tracker handles jobs via data node-based Task Trackers that manage the actual tasks or processes. Job
Tracker manages the whole client-requested job, passing subtasks to individual slave nodes and monitoring their
availability and the tasks' completion.
Hadoop V1 only scales to clusters of around 4,000 to 5,000 nodes, and there are also limitations on the number of
concurrent processes that can run. It has only a single processing type, Map Reduce, which although powerful does
not allow for requirements like graph or real-time processing.
The Differences in Hadoop V2
With YARN, Hadoop V2's Job Tracker has been split into a master Resource Manager and slave-based Application
Master processes. It separates the major tasks of the Job Tracker: resource management and monitoring/scheduling.
The Job History server now has the function of providing information about completed jobs. The Task Tracker has
been replaced by a slave-based Node Manager, which handles slave node-based resources and manages tasks on
the node. The actual tasks reside within containers launched by the Node Manager. The Map Reduce function is
controlled by the Application Master process, while the tasks themselves may be either Map or Reduce tasks.
Hadoop V2 also offers the ability to use non-Map Reduce processing, like Apache Giraph for graph processing, or
Impala for data query. Resources on YARN can be shared among all three processing systems.
Figure 2-2 shows client task requests being sent to the global Resource Manager and the slave-based Node
Managers launching containers, which have the actual tasks. It also monitors their resource usage. The Application
Master requests containers from the scheduler and receives status updates from the container-based Map Reduce tasks.
Figure 2-2. Hadoop V2 architecture
This architecture enables Hadoop V2 to scale to much larger clusters and provides the ability to have a higher
number of concurrent processes. It also now offers the ability, as mentioned earlier, to run different types of processes
concurrently, not just Map Reduce.
 
Search WWH ::




Custom Search