Database Reference
In-Depth Information
<script>manufacturer.pig</script>
</pig>
<ok to="pig-join"/>
<error to="fail"/>
</action>
Creating an Oozie Workflow
In this example, I examine and run a Pig- and Hive-based Oozie workflow against Oozie. The example uses
a Canadian vehicle fuel-consumption data set that is provided at the website data.gc.ca . You can either
search for “Fuel Consumption Ratings” to find the data set or use the link http://open.canada.ca/data/en/
dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64 .
To begin, I download the English version of each CSV file. For instance, I have downloaded these files using the
Linux hadoop account, downloading them to that account's Downloads directory, as the Linux ls command shows:
[hadoop@hc1nn Downloads]$ ls
MY1995-1999 Fuel Consumption Ratings.csv MY2007 Fuel Consumption Ratings.csv
MY2000 Fuel Consumption Ratings.csv MY2008 Fuel Consumption Ratings.csv
MY2001 Fuel Consumption Ratings.csv MY2009 Fuel Consumption Ratings.csv
MY2002 Fuel Consumption Ratings.csv MY2010 Fuel Consumption Ratings.csv
MY2003 Fuel Consumption Ratings.csv MY2011 Fuel Consumption Ratings.csv
MY2004 Fuel Consumption Ratings.csv MY2012 Fuel Consumption Ratings.csv
MY2005 Fuel Consumption Ratings.csv MY2013 Fuel Consumption Ratings.csv
MY2006 Fuel Consumption Ratings.csv MY2014 Fuel Consumption Ratings.csv
I then need to copy these files to an HDFS directory so that they can be used by an Oozie workflow job. To do this,
I create some HDFS directories, as follows:
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf/fuel
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf/fuel/rawdata
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf/fuel/pigwf
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf/fuel/entity
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf/fuel/entity/manufacturer
[hadoop@hc1nn Downloads]$ hdfs dfs -mkdir /user/hadoop/oozie_wf/fuel/entity/model
The Hadoop file system ls command produces a long list that shows the three HDFS subdirectories I've just
created and that will be used in this example.
[hadoop@hc1nn Downloads]$ hdfs dfs -ls /user/hadoop/oozie_wf/fuel/
Found 3 items
drwxr-xr-x - hadoop hadoop 0 2014-07-12 18:16 /user/hadoop/oozie_wf/fuel/entity
drwxr-xr-x - hadoop hadoop 0 2014-07-12 18:15 /user/hadoop/oozie_wf/fuel/pigwf
drwxr-xr-x - hadoop hadoop 0 2014-07-08 18:16 /user/hadoop/oozie_wf/fuel/rawdata
I employ the rawdata directory under /user/hadoop/oozie_wf/fuel/ on HDFS to contain the CSV data that I
will use. I use the pigwf directory to contain the scripts for the task. I use the entity directory and its subdirectories to
contain the data used by this task.
 
Search WWH ::




Custom Search