Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

across many jobs) because information about each of the tasks needs to be

kept in memory on the JobTracker itself.

As physical hardware continues to scale and modern datacenters make it

possible to host very large clusters, these scaling limitations began to take

their toll on Hadoop's scalability. Additionally, new processing workloads,

such as database-like applications and long-lived stream processing

applications, were somewhat difficult to match to Hadoop's processing

model, which assumes a long-lived but small set of reducer tasks coupled

with a large number of short-lived mapper tasks.

To address the needs of both growing clusters and changing workloads,

YARN was developed as Hadoop 2.

Architecture

In the abstract, the YARN architecture is not so different from the original

Hadoop infrastructure. Rather than a JobTracker and a TaskTracker ,

the top-level servers are now the ResourceManager and the

NodeManager, respectively. The important difference is that these servers

now manage applications, not individual tasks.

Anapplication inYARNconsistsofan ApplicationMaster andanumber

of containers that are hosted on each node. The ApplicationMaster ,

as might be guessed from the name, is in charge of coordinating a job

and managing its assigned containers, which host the individual tasks. The

relationship between these components is shown in Figure 5.5 .

Figure 5.5

In Hadoop 2's Map-Reduce implementation, the ApplicationMaster

serves as the JobTracker . The exception is that each

ApplicationMaster only manages the task for a single job instead of

Search WWH ::

Custom Search

Home