Database Reference
In-Depth Information
87 <join name="hive-join" to="end"/>
88
89 <kill name="fail">
90 <message>Workflow died, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
91 </kill>
92
93 <end name="end"/>
An error condition in any of the four main actions passes control to the kill control node, called fail . The OK
condition just passes the control flow to the next success node in the workflow.
Now I will briefly explain the manufacturer and model pig and sql script contents. I have used the Hadoop file
system cat command to display the contents of the manufacturuer.pig file.
[hadoop@hc1nn fuel]$ hdfs dfs -cat /user/hadoop/oozie_wf/fuel/pigwf/manufacturer.pig
01 -- get the raw data from the files from the csv files
02
03 rlines = LOAD '/user/hadoop/oozie_wf/fuel/rawdata/*.csv' USING PigStorage(',') AS
04 ( year:int, manufacturer:chararray, model:chararray, class:chararray, size:float,
cylinders:int,
05 transmission:chararray, fuel:chararray, cons_cityl100:float, cond_hwyl100:float, cons_
citympgs:int,
06 cond_hwympgs:int, lyears:int, co2s:int
07 );
08
09 mlist = FOREACH rlines GENERATE manufacturer;
10
11 dlist = DISTINCT mlist ;
12
13 -- save to a new file
14
15 STORE dlist INTO '/user/hadoop/oozie_wf/fuel/entity/manufacturer/' ;
The pig script is just stripping the manufacturer information from the rawdata CSV files and storing that data
in the HDFS directory under entity/manufacturer. The sql script called manufacturuer.sql then processes that
information and stores it in Hive.
[hadoop@hc1nn fuel]$ hdfs dfs -cat /user/hadoop/oozie_wf/fuel/pigwf/manufacturer.sql
01 drop table if exists rawdata2 ;
02
03 create external table rawdata2 (
04 line string
05 )
06 location '/user/hadoop/oozie_wf/fuel/entity/manufacturer/' ;
07
08 drop table if exists manufacturer ;
09
10 create table manufacturer as
11 select distinct line from rawdata2 where line not like '%=%'
12 and line not like '% % %' ;
 
Search WWH ::




Custom Search