Database Reference
In-Depth Information
managing a large number of jobs. This allows each ApplicationMaster
to operate independently in Map-Reduce settings. It also allows for more
sophisticated resource management and security models because jobs are
now essentially completely independent.
Relationship to Samza
Samza is implemented as an application on top of YARN. The Samza
application hastherequired ApplicationManager thatisusedtomanage
Samza TaskRunners hosted within YARN Containers . The
TaskRunners execute StreamTasks , which are the Samza equivalent of a
Storm Bolt .
All of Samza's communication is hosted through Kafka brokers. Like HDFS
DataNodes in a Hadoop Map-Reduce application, these brokers are usually
co-located on the same machines hosting the Samza Containers. Samza
then uses Kafka's topics and natural partitioning to implement many of the
grouping features found in stream processing applications.
Getting Started with YARN and Samza
Although Hadoop 2 has been available for some time, it is still not
particularly common in production environments, though that is changing
rapidly. Most importantly for many users, Hadoop 2 is now supported by
Amazon's Elastic MapReduce product as a general release, making it easy to
spin up a cluster.
Apache YARN is also now supported by at least two of the major Hadoop
distributions, with more being added. Using their respective cluster
management tools to set up a YARN cluster is fairly painless. The only
downside is that packaged distributions tend to have a somewhat arbitrary
set of patches and versions that may lag the most recently released version
of the Apache project.
Additionally, it is possible to spin up a cluster using the Apache packages
either on a single node for experimentation or in a distributed fashion.
Single Node Samza
The easiest way to get started with Samza on a single node is to use the
single-node YARN installation packaged with the Hello Samza project. This
Search WWH ::




Custom Search