Database Reference
In-Depth Information
Unlike JobControl , which runs on the client machine submitting the jobs, Oozie runs
as a service in the cluster, and clients submit workflow definitions for immediate or later
execution. In Oozie parlance, a workflow is a DAG of action nodes and control-flow
nodes .
An action node performs a workflow task, such as moving files in HDFS; running a
MapReduce, Streaming, Pig, or Hive job; performing a Sqoop import; or running an arbit-
rary shell script or Java program. A control-flow node governs the workflow execution
between actions by allowing such constructs as conditional logic (so different execution
branches may be followed depending on the result of an earlier action node) or parallel
execution. When the workflow completes, Oozie can make an HTTP callback to the client
to inform it of the workflow status. It is also possible to receive callbacks every time the
workflow enters or exits an action node.
Defining an Oozie workflow
Workflow definitions are written in XML using the Hadoop Process Definition Language,
the specification for which can be found on the Oozie website . Example 6-14 shows a
simple Oozie workflow definition for running a single MapReduce job.
Example 6-14. Oozie workflow definition to run the maximum temperature MapReduce job
<workflow-app xmlns= "uri:oozie:workflow:0.1" name= "max-temp-workflow" >
<start to= "max-temp-mr" />
<action name= "max-temp-mr" >
<map-reduce>
<job-tracker> ${resourceManager} </job-tracker>
<name-node> ${nameNode} </name-node>
<prepare>
<delete path= "${nameNode}/user/${wf:user()}/output" />
</prepare>
<configuration>
<property>
<name> mapred.mapper.new-api </name>
<value> true </value>
</property>
<property>
<name> mapred.reducer.new-api </name>
<value> true </value>
</property>
<property>
<name> mapreduce.job.map.class </name>
<value> MaxTemperatureMapper </value>
</property>
<property>
Search WWH ::




Custom Search