Database Reference
In-Depth Information
Figure 6-4. Transition diagram of an Oozie workflow
All workflows must have one start and one end node. When the workflow job starts, it
transitions to the node specified by the start node (the max-temp-mr action in this
example). A workflow job succeeds when it transitions to the end node. However, if the
workflow job transitions to a kill node, it is considered to have failed and reports the
appropriate error message specified by the message element in the workflow definition.
The bulk of this workflow definition file specifies the map-reduce action. The first two
elements, job-tracker and name-node , are used to specify the YARN resource
manager (or jobtracker in Hadoop 1) to submit the job to and the namenode (actually a
Hadoop filesystem URI) for input and output data. Both are parameterized so that the
workflow definition is not tied to a particular cluster (which makes it easy to test). The
parameters are specified as workflow job properties at submission time, as we shall see
later.
WARNING
Despite its name, the job-tracker element is used to specify a YARN resource manager address and
port.
The optional prepare element runs before the MapReduce job and is used for directory
deletion (and creation, too, if needed, although that is not shown here). By ensuring that
the output directory is in a consistent state before running a job, Oozie can safely rerun the
action if the job fails.
The MapReduce job to run is specified in the configuration element using nested
elements for specifying the Hadoop configuration name-value pairs. You can view the
MapReduce configuration section as a declarative replacement for the driver classes that
we have used elsewhere in this topic for running MapReduce programs (such as
Example 2-5 ) .
We have taken advantage of JSP Expression Language (EL) syntax in several places in the
workflow definition. Oozie provides a set of functions for interacting with the workflow.
Search WWH ::




Custom Search