Scheduling and Workflow - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Now, you use the Hadoop file system command put to copy the share directory onto HDFS under /user/Oozie

workflow:

[oozie@hc1nn ooziesharelib]$ hdfs dfs -put share /user/oozie/share

It is quite simple to start the Oozie server by using the Linux service command as the root user. You use the

Linux su command to switch the user to root, then start the Oozie service:

[hadoop@hc1nn ooziesharelib]$ su -

[root@hc1nn ~]$ service oozie start

[root@hc1nn ~]$ exit

Finally, you can use the Oozie client as the Linux hadoop user to access Oozie and check the server's status:

[hadoop@hc1nn ~]$ oozie admin -oozie http://localhost:11000/oozie -status

System mode: NORMAL

[hadoop@hc1nn ~]$ oozie admin -oozie http://localhost:11000/oozie -version

Oozie server build version: 3.3.2-cdh4.7.0

By setting the OOZIE_URL variable, you can simplify the Oozie client commands. The URL tells the Oozie client the

location in terms of the host name and port of the Oozie server, as follows:

[hadoop@hc1nn ~]$ export OOZIE_URL=http://localhost:11000/oozie

[hadoop@hc1nn ~]$ oozie admin -version

Oozie server build version: 3.3.2-cdh4.7.0

At this point, you can access the Oozie web console via the URL http://localhost:11000/oozie . (I discuss this

in more detail following the discussion of workflows in Oozie).

The Mechanics of the Oozie Workflow

In general, the workflow is a set of chained actions that call HDFS-based scripts like Pig and Hive. All input comes from

HDFS, not from the Linux file system, because Oozie cannot guarantee which cluster nodes will be used to process the

workflow. Created as an XML document, an Oozie workflow script contains a series of linked actions controlled via

pass/fail control nodes that determine where the control flow moves next. The fork option, for example, allows actions

to be run in parallel. You can configure the script to send notifications of the workflow outcome via email or output

message, as well as set action parameters and add tool-specific actions like Pig, Hive, and Java to the workflow.

Oozie Workflow Control Nodes

The workflow control nodes are like traffic cops in a script, directing the flow of work. The start control node defines

the starting point for the workflow. Each workflow script can have only one start node, and it must define an existing

action.

The end control node is also mandatory and indicates the end of the workflow. If the control flow reaches the end

control node, it has finished sucessfully.

Search WWH ::

Custom Search

Home