Database Reference
In-Depth Information
11 oozie.libpath=${nameNode}/user/hadoop/share/lib
12 oozie.use.system.libpath=true
13 oozie.wf.rerun.failnodes=true
14
15 hdfsUser=hadoop
16 wfProject=fuel
17 hdfsWfHome=${nameNode}/user/${hdfsUser}/oozie_wf/${wfProject}
18 hdfsRawData=${hdfsWfHome}/rawdata
19 hdfsEntityData=${hdfsWfHome}/entity
20
21 oozie.wf.application.path=${hdfsWfHome}/pigwf
22 oozieWfPath=${hdfsWfHome}/pigwf/
The parameters in this file specify the Hadoop name node by server and port. Because YARN is being employed,
the Resource Manager is defined via its host and port by using the JobTracker variable. Job Tracker is obviously a
Hadoop V1 component name, but this works for YARN. The queue name to be used for this workflow, high_pool , is
also specified.
The library path of the Oozie shared library is defined by oozie.libpath, along with the parameter oozie.use.
system.libpath . The HDFS user for the job is specified, as is a project name. Finally, the paths are defined for the
workflow scripts and entity data that will be produced. The special variable oozie.wf.application.path is used to
define the location of the workflow job file.
The workflow.txt file is the main control file for the workflow job. It controls the flow of actions, via Oozie, and
manages the subtasks. This workflow file runs two parallel streams of processing to process the data in the HDFS
rawdata directory.
The manufacturer.pig script is called to strip manufacturer-based data from the HDFS-based rawdata files. This
data is placed in the HDFS-based entity/manufacturer directory. Then the script manufacturer.sql is called to process
this data to the Hive data warehouse.
In parallel to this (via a fork option in the xml), the model.pig script is called to strip the vehicle model-based
data from the HDFS rawdata files. This data is placed in the HDFS entity/model directory. Then the script model.sql is
called to process this data to the Hive data warehouse.
The workflow.xml workflow file has been built using a combination of the workflow elements described earlier
(see “The Mechanics of the Oozie Workflow”). I have used the Hadoop file system cat command to display its contents:
[hadoop@hc1nn fuel]$ hdfs dfs -cat /user/hadoop/oozie_wf/fuel/pigwf/workflow.xml
01 <workflow-app name="FuelWorkFlow" xmlns="uri:Oozie workflow:workflow:0.1">
02
03 <start to="pig-fork"/>
04
05 <fork name="pig-fork">
06 <path start="pig-manufacturer"/>
07 <path start="pig-model"/>
08 </fork>
09
10 <action name="pig-manufacturer">
11 <pig>
12 <job-tracker>${jobTracker}</job-tracker>
13 <name-node>${nameNode}</name-node>
14 <prepare>
15 <delete path="${hdfsEntityData}/manufacturer"/>
16 </prepare>
 
Search WWH ::




Custom Search