Database Reference
In-Depth Information
across many jobs) because information about each of the tasks needs to be
kept in memory on the JobTracker itself.
As physical hardware continues to scale and modern datacenters make it
possible to host very large clusters, these scaling limitations began to take
their toll on Hadoop's scalability. Additionally, new processing workloads,
such as database-like applications and long-lived stream processing
applications, were somewhat difficult to match to Hadoop's processing
model, which assumes a long-lived but small set of reducer tasks coupled
with a large number of short-lived mapper tasks.
To address the needs of both growing clusters and changing workloads,
YARN was developed as Hadoop 2.
Architecture
In the abstract, the YARN architecture is not so different from the original
Hadoop infrastructure. Rather than a JobTracker and a TaskTracker ,
the top-level servers are now the ResourceManager and the
NodeManager, respectively. The important difference is that these servers
now manage applications, not individual tasks.
Anapplication inYARNconsistsofan ApplicationMaster andanumber
of containers that are hosted on each node. The ApplicationMaster ,
as might be guessed from the name, is in charge of coordinating a job
and managing its assigned containers, which host the individual tasks. The
relationship between these components is shown in Figure 5.5 .
Figure 5.5
In Hadoop 2's Map-Reduce implementation, the ApplicationMaster
serves as the JobTracker . The exception is that each
ApplicationMaster only manages the task for a single job instead of
 
 
Search WWH ::




Custom Search