Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

task.class=wiley.streaming.samza.WordSplitTask

task.inputs=kafka.wikipedia-raw

It can also include properties for a task's checkpoint feature, which uses

Kafka to record the task's state and allows Samza to recover in the event of a

failure:

task.checkpoint.factory=

org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory

task.checkpoint.system=kafka

task.checkpoint.replication.factor=1

Next up is Samza's comprehensive metric reporting system. It is entirely

optionaland,atthemoment,hasbotha snapshot and jmx flavoravailable.

This example uses both flavors at the same time:

metrics.reporters=snapshot,jmx

metrics.reporter.snapshot.class=

org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory

metrics.reporter.snapshot.stream=kafka.metrics

metrics.reporter.jmx.class=

org.apache.samza.metrics.reporter.JmxReporterFactory

TheSerializationsectionofthe Job configurationallowsforthedefinitionof

the various Serializers and Deserializers (these are often combined

into “SerDe” in processing jargon) used in the Job . This example uses the

built-in JSON SerDe as well as the Metrics SerDe used by Samza for its

metric snapshots:

serializers.registry.json.class=

org.apache.samza.serializers.JsonSerdeFactory

serializers.registry.metrics.class=

org.apache.samza.serializers.MetricsSnapshotSerdeFactory

Finally, the Input and Output systems are defined. These implement

streams in the Samza environment and are pluggable systems, though only

the Kafka stream is shipped with Samza at the moment. This configuration

defines the Kafka system used by task.inputs as well as the output

Search WWH ::

Custom Search

Home