Database Reference
In-Depth Information
job.name=word-count
# YARN
yarn.package.path=
file://${basedir}/target/
${project.artifactId}-${pom.version}-dist.tar.gz
# Task
task.class=wiley.streaming.samza.WordCountTask
task.inputs=kafka.wikipedia-words
task.window.ms=10000
# Serializers
serializers.registry.json.class=
org.apache.samza.serializers.JsonSerdeFactory
# Systems
systems.kafka.samza.factory=
org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=json
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.consumer.auto.offset.reset=largest
systems.kafka.producer.metadata.broker.list=localhost:9092
systems.kafka.producer.producer.type=sync
systems.kafka.producer.batch.num.messages=1
Packaging a Job for YARN
To package
Job
s for YARN, you must create a distribution archive. This
archive contains not only the JAR file for the
Job
implementation, but all
of the configuration files and dependencies for the project. It also includes
shell scripts for starting
Job
s using YARN. This archive will be distributed
by YARN to each of the nodes that will run the
Job
.
The easiest way to construct this archive is to use the Maven assembly
plug-in. This plug-in is added to the
plugins
section of the
pom.xml
file:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.3</version>
<configuration>
<descriptors>
<descriptor>src/main/assembly/src.xml</descriptor>
</descriptors>