Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

job.name=word-count

# YARN

yarn.package.path=

file://${basedir}/target/

${project.artifactId}-${pom.version}-dist.tar.gz

# Task

task.class=wiley.streaming.samza.WordCountTask

task.inputs=kafka.wikipedia-words

task.window.ms=10000

# Serializers

serializers.registry.json.class=

org.apache.samza.serializers.JsonSerdeFactory

# Systems

systems.kafka.samza.factory=

org.apache.samza.system.kafka.KafkaSystemFactory

systems.kafka.samza.msg.serde=json

systems.kafka.consumer.zookeeper.connect=localhost:2181/

systems.kafka.consumer.auto.offset.reset=largest

systems.kafka.producer.metadata.broker.list=localhost:9092

systems.kafka.producer.producer.type=sync

systems.kafka.producer.batch.num.messages=1

Packaging a Job for YARN

To package Job s for YARN, you must create a distribution archive. This

archive contains not only the JAR file for the Job implementation, but all

of the configuration files and dependencies for the project. It also includes

shell scripts for starting Job s using YARN. This archive will be distributed

by YARN to each of the nodes that will run the Job .

The easiest way to construct this archive is to use the Maven assembly

plug-in. This plug-in is added to the plugins section of the pom.xml file:

<artifactId>maven-assembly-plugin</artifactId>

<descriptor>src/main/assembly/src.xml</descriptor>

</descriptors>

Search WWH ::

Custom Search

Home