Database Reference
In-Depth Information
job.name=word-count
# YARN
yarn.package.path=
file://${basedir}/target/
${project.artifactId}-${pom.version}-dist.tar.gz
# Task
task.class=wiley.streaming.samza.WordCountTask
task.inputs=kafka.wikipedia-words
task.window.ms=10000
# Serializers
serializers.registry.json.class=
org.apache.samza.serializers.JsonSerdeFactory
# Systems
systems.kafka.samza.factory=
org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=json
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.consumer.auto.offset.reset=largest
systems.kafka.producer.metadata.broker.list=localhost:9092
systems.kafka.producer.producer.type=sync
systems.kafka.producer.batch.num.messages=1
Packaging a Job for YARN
To package Job s for YARN, you must create a distribution archive. This
archive contains not only the JAR file for the Job implementation, but all
of the configuration files and dependencies for the project. It also includes
shell scripts for starting Job s using YARN. This archive will be distributed
by YARN to each of the nodes that will run the Job .
The easiest way to construct this archive is to use the Maven assembly
plug-in. This plug-in is added to the plugins section of the pom.xml file:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.3</version>
<configuration>
<descriptors>
<descriptor>src/main/assembly/src.xml</descriptor>
</descriptors>
Search WWH ::




Custom Search