Database Reference
In-Depth Information
Figure 10-5. Pentaho Explorer's Design view
Creating ETL
Now that you have a sense of the PDI interface, it's time to examine an example of a Map Reduce task to see how PDI
functions. I create an ETL example by starting with mapper and reducer transformations, and follow with the Map
Reduce job itself. By following my steps you'll learn how each module is configured, as well as gain some tips on how
to avoid pitfalls.
To create my PDI Map Reduce example, I first need some data. The HDFS file (rawdata.txt) should look familiar—
parts of it were used in earlier chapters. Here, I use fuel consumption details for various vehicle models over a number
of years. The data file is CSV-based and resides under HDFS at /data/pentaho/rdbms/. I use the Hadoop file system
cat command to dump the file contents and the Linux head command to limit the data output:
[hadoop@hc2nn ~]$ hdfs dfs -cat /data/pentaho/rdbms/rawdata.txt | head -5
1995,ACURA,INTEGRA,SUBCOMPACT,1.8,4,A4,X,10.2,7,28,40,1760,202
1995,ACURA,INTEGRA,SUBCOMPACT,1.8,4,M5,X,9.6,7,29,40,1680,193
1995,ACURA,INTEGRA GS-R,SUBCOMPACT,1.8,4,M5,Z,9.4,7,30,40,1660,191
1995,ACURA,LEGEND,COMPACT,3.2,6,A4,Z,12.6,8.9,22,32,2180,251
1995,ACURA,LEGEND COUPE,COMPACT,3.2,6,A4,Z,13,9.3,22,30,2260,260
 
Search WWH ::




Custom Search