Database Reference
In-Depth Information
Trident system, Samza provides some primitives for building common types
of streaming applications and maintaining state within a processing
application.
Apache YARN
Rather than implement its own server management framework, Samza
off-loads much of its systems infrastructure onto Apache YARN. YARN,
which stands for Yet Another Resource Negotiator, is used to manage the
deployment, fault tolerance, and security of a Samza processing pipeline.
Background
The YARN project was originally born out of the limitations of the Hadoop
project. The Hadoop project was built around a
JobTracker
server that
managed the distribution of tasks, mappers, and reducers, to other servers
running the
TaskTracker
server. A client that wanted to submit a job
would connect to the
JobTracker
and specify the input set, usually a
distributed set of data blocks hosted on Hadoop's distributed file system, as
well as any supporting code or data that needed to be distributed to each
node. The
JobTracker
would then break this request into small tasks and
schedule each of them on the tracker, as shown in
Figure 5.4
.
This works well on modestly sized clusters, but there's a practical limit
in clusters with about 5,000 multicore servers. It also places practical
limitations on the total number of tasks (either in a single job or spread