Database Reference
In-Depth Information
across many jobs) because information about each of the tasks needs to be
kept in memory on the
JobTracker
itself.
As physical hardware continues to scale and modern datacenters make it
possible to host very large clusters, these scaling limitations began to take
their toll on Hadoop's scalability. Additionally, new processing workloads,
such as database-like applications and long-lived stream processing
applications, were somewhat difficult to match to Hadoop's processing
model, which assumes a long-lived but small set of reducer tasks coupled
with a large number of short-lived mapper tasks.
To address the needs of both growing clusters and changing workloads,
YARN was developed as Hadoop 2.
Architecture
In the abstract, the YARN architecture is not so different from the original
Hadoop infrastructure. Rather than a
JobTracker
and a
TaskTracker
,
the top-level servers are now the
ResourceManager
and the
NodeManager,
respectively. The important difference is that these servers
now manage applications, not individual tasks.
Anapplication inYARNconsistsofan
ApplicationMaster
andanumber
of containers that are hosted on each node. The
ApplicationMaster
,
as might be guessed from the name, is in charge of coordinating a job
and managing its assigned containers, which host the individual tasks. The
relationship between these components is shown in
Figure 5.5
.
In Hadoop 2's Map-Reduce implementation, the
ApplicationMaster
serves as the
JobTracker
. The exception is that each
ApplicationMaster
only manages the task for a single job instead of