Database Reference
In-Depth Information
So, my next step is to upload the CSV files from the Linux file system Downloads directory to the HDFS directory
rawdata:
[hadoop@hc1nn Downloads]$ hdfs dfs -copyFromLocal *.csv /user/hadoop/oozie_wf/fuel/rawdata
Now, the workflow data are ready, and the scripts and configuration files that the workflow will use need to be
copied into place. For this example, I have created all the necessary files. To begin, I load them to the HDFS pigwf
directory using the Hadoop file system copyFromLocal command:
[hadoop@hc1nn Downloads]$ cd /home/hadoop/oozie/pig/fuel
[hadoop@hc1nn fuel]$ ls
load.job.properties model.pig manufacturer.pig model.sql workflow.xml
manufacturer.sql
[hadoop@hc1nn fuel]$ hdfs dfs -copyFromLocal * /user/hadoop/oozie_wf/fuel/pigwf
Next, using the Hadoop file system ls command, I check the contents of the pigwf directory. The listing shows
the sizes of the files that were just uploaded:
[oozie@hc1nn fuel]$ hdfs dfs -ls /user/oozie/oozie_wf/fuel/pigwf/
Found 6 items
-rw-r--r-- 2 oozie oozie 542 2014-07-06 15:48 /user/oozie/oozie_wf/fuel/pigwf/load.job.properties
-rw-r--r-- 2 oozie oozie 567 2014-07-08 19:13 /user/oozie/oozie_wf/fuel/pigwf/manufacturer.pig
-rw-r--r-- 2 oozie oozie 306 2014-07-12 18:06 /user/oozie/oozie_wf/fuel/pigwf/manufacturer.sql
-rw-r--r-- 2 oozie oozie 546 2014-07-08 19:13 /user/oozie/oozie_wf/fuel/pigwf/model.pig
-rw-r--r-- 2 oozie oozie 283 2014-07-12 18:06 /user/oozie/oozie_wf/fuel/pigwf/model.sql
-rw-r--r-- 2 oozie oozie 2400 2014-07-12 18:15 /user/oozie/oozie_wf/fuel/pigwf/workflow.xml
Note that I actually don't need to copy the load.job.properties file to HDFS, as it will be located from the local
Linux file system. Having uploaded the files, it is time to explain their contents.
The Workflow Configuration File
The first file is the workflow configuration file, called load.job.properties; this specifies parameters for the workflow.
I have listed its contents using the Hadoop file system cat command and have taken the liberty of adding line numbers
here and elsewhere to use in explaining the steps:
[hadoop@hc1nn fuel]$ hdfs dfs -cat /user/hadoop/oozie_wf/fuel/pigwf/load.job.properties
01 # ----------------------------------------
02 # Workflow job properties
03 # ----------------------------------------
04
05 nameNode=hdfs://hc1nn:8020
06
07 # Yarn resource manager host and port
08 jobTracker=hc1nn:8032
09 queueName=high_pool
10
 
Search WWH ::




Custom Search