Database Reference
In-Depth Information
Samza Jobs
With a configured cluster and data being imported to the Kafka portion of
the grid, it is time to implement a Samza Job . A Job in Samza parlance
is roughly equivalent to a Bolt in Storm. However, rather than being
assembled into a Topology , Jobs in Samza are all independent entities,
and any composition is simply a matter of reading or writing to a particular
Kafka topic.
On the one hand, this potentially allows for significantly more efficient uses
of resources and easier management of highly interconnected flows. For
example, a single Samza Job can be responsible for taking an input stream
and producing a “valid” input stream that can be used by any number of
downstream Job implementations. New processes can also be easily added
to take advantage of these streams, whereas before they might have had to
reprocess the entire raw stream of data again with only a small change.
The downside of this approach is that the structure of the topology of jobs
is now purely abstract. In the Storm model, a topology is a distinct thing,
and all of the processing steps are controlled and monitored from a central
location. In the Samza model, the topology is implied in the arrangement of
topics, but never explicitly stated. This can lead to problems with managing
changes in upstream Job implementations or the inadvertent introduction
of cycles into the topology's structure.
Preparing a Job Application
A Samza Job is made up of two pieces. The first is the actual code, which
is an implementation of the StreamTask interface. The second piece is
the configuration of the Task, including the name of the input streams and
the configuration of logging, monitoring, and other ancillary facilities. Any
number of Job implementations can be hosted and packaged together for
ease of deployment.
If not using the grid implementation previously described, it will be
necessary to install the Samza Maven packages. These packages are not yet
available on Maven Central or another Maven repository so they need to be
built and installed locally. Check out the Git repository for Samza and then
run the Gradle script using the provided gradlew driver:
Search WWH ::




Custom Search