Database Reference
In-Depth Information
$ git clone http://git-wip-us.apache.org/repos/asf/
incubator-samza.git
$ cd incubator-samza
$ ./gradlew -PscalaVersion=2.8.1 clean
publishToMavenLocal
When this is complete, the project containing the Job implementation
should be updated to include the
samza-api
dependency in its
pom.xml
file:
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-api</artifactId>
<version>0.7.0</version>
</dependency>
Configuring a Job
Samza's
Job
configurations are accomplished through the use of a
Properties file that is passed to the Samza framework when submitting the
Job
to the YARN framework. The Properties file starts with a
Job
factory
class specification and a
Job
name, along with a distribution package:
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.name=wordcount-split
yarn.package.path=
file://${basedir}/target/
${project.artifactId}-${pom.version}-dist.tar.gz
The factory class will generally always be
YarnJobFactory
, and the name
is currently set to
wordcount-split
, which is implemented in the next
section. The
yarn.package.path
is filled in by the build process and
specifies the name of an archive that YARN transfers to each of the nodes.
This contains any support JAR files that might be needed along with the
code that implements the
Job
.
Next, the task is defined. This consists, minimally, of the
task.class
and
task .inputs
properties: