Scheduling and Workflow - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

87 <join name="hive-join" to="end"/>

88

89 <kill name="fail">

90 <message>Workflow died, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

91 </kill>

92

93 <end name="end"/>

An error condition in any of the four main actions passes control to the kill control node, called fail . The OK

condition just passes the control flow to the next success node in the workflow.

Now I will briefly explain the manufacturer and model pig and sql script contents. I have used the Hadoop file

system cat command to display the contents of the manufacturuer.pig file.

[hadoop@hc1nn fuel]$ hdfs dfs -cat /user/hadoop/oozie_wf/fuel/pigwf/manufacturer.pig

01 -- get the raw data from the files from the csv files

02

03 rlines = LOAD '/user/hadoop/oozie_wf/fuel/rawdata/*.csv' USING PigStorage(',') AS

04 ( year:int, manufacturer:chararray, model:chararray, class:chararray, size:float,

cylinders:int,

05 transmission:chararray, fuel:chararray, cons_cityl100:float, cond_hwyl100:float, cons_

citympgs:int,

06 cond_hwympgs:int, lyears:int, co2s:int

07 );

08

09 mlist = FOREACH rlines GENERATE manufacturer;

10

11 dlist = DISTINCT mlist ;

12

13 -- save to a new file

14

15 STORE dlist INTO '/user/hadoop/oozie_wf/fuel/entity/manufacturer/' ;

The pig script is just stripping the manufacturer information from the rawdata CSV files and storing that data

in the HDFS directory under entity/manufacturer. The sql script called manufacturuer.sql then processes that

information and stores it in Hive.

[hadoop@hc1nn fuel]$ hdfs dfs -cat /user/hadoop/oozie_wf/fuel/pigwf/manufacturer.sql

01 drop table if exists rawdata2 ;

02

03 create external table rawdata2 (

04 line string

05 )

06 location '/user/hadoop/oozie_wf/fuel/entity/manufacturer/' ;

07

08 drop table if exists manufacturer ;

09

10 create table manufacturer as

11 select distinct line from rawdata2 where line not like '%=%'

12 and line not like '% % %' ;

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Search WWH ::

Custom Search

Home