Database Reference
In-Depth Information
Trident system, Samza provides some primitives for building common types
of streaming applications and maintaining state within a processing
application.
Apache YARN
Rather than implement its own server management framework, Samza
off-loads much of its systems infrastructure onto Apache YARN. YARN,
which stands for Yet Another Resource Negotiator, is used to manage the
deployment, fault tolerance, and security of a Samza processing pipeline.
Background
The YARN project was originally born out of the limitations of the Hadoop
project. The Hadoop project was built around a JobTracker server that
managed the distribution of tasks, mappers, and reducers, to other servers
running the TaskTracker server. A client that wanted to submit a job
would connect to the JobTracker and specify the input set, usually a
distributed set of data blocks hosted on Hadoop's distributed file system, as
well as any supporting code or data that needed to be distributed to each
node. The JobTracker would then break this request into small tasks and
schedule each of them on the tracker, as shown in Figure 5.4 .
Figure 5.4
This works well on modestly sized clusters, but there's a practical limit
in clusters with about 5,000 multicore servers. It also places practical
limitations on the total number of tasks (either in a single job or spread
 
 
Search WWH ::




Custom Search