Database Reference
In-Depth Information
The top-left side of the Figure 10-26 interface shows the local repository for the project bd1; from here, I can
double-click the tmr1 job to open it. At the bottom of the interface is a designer and code section. The Code tab
enables me to examine the Java code that Talend generates from the job file; the Designer tab allows me to both
configure each step of the job by selecting it and to run the job once the configuration is completed.
Before I proceed to use the Open Studio interface, I take a moment to consider the test data that this example
job will use. For instance, I have stored two CSV-based data files in the HDFS directory /data/talend/rdbms/, as the
following Hadoop file system ls command shows:
[hadoop@hc2nn ~]$ hdfs dfs -ls /data/talend/rdbms
Found 2 items
-rw-r--r-- 3 hadoop supergroup 1381638 2014-10-10 16:36 /data/talend/rdbms/rawdata.txt
-rw-r--r-- 3 hadoop supergroup 4389 2014-10-18 08:17 /data/talend/rdbms/rawprices.txt
The first file, called rawdata.txt, contains the vehicle model fuel consumption data that has been used in previous
chapter examples, while the second file, called rawprices.txt, contains the matching model prices. The combined
Hadoop file system cat command and the Linux head commands list the first 10 rows of each file, as follows:
[hadoop@hc2nn ~]$ hdfs dfs -cat /data/talend/rdbms/rawdata.txt | head -10
1995,ACURA,INTEGRA,SUBCOMPACT,1.8,4,A4,X,10.2,7,28,40,1760,202
1995,ACURA,INTEGRA,SUBCOMPACT,1.8,4,M5,X,9.6,7,29,40,1680,193
1995,ACURA,INTEGRA GS-R,SUBCOMPACT,1.8,4,M5,Z,9.4,7,30,40,1660,191
1995,ACURA,LEGEND,COMPACT,3.2,6,A4,Z,12.6,8.9,22,32,2180,251
1995,ACURA,LEGEND COUPE,COMPACT,3.2,6,A4,Z,13,9.3,22,30,2260,260
1995,ACURA,LEGEND COUPE,COMPACT,3.2,6,M6,Z,13.4,8.4,21,34,2240,258
1995,ACURA,NSX,TWO-SEATER,3,6,A4,Z,13.5,9.2,21,31,2320,267
1995,ACURA,NSX,TWO-SEATER,3,6,M5,Z,12.9,9,22,31,2220,255
1995,ALFA ROMEO,164 LS,COMPACT,3,6,A4,Z,15.7,10,18,28,2620,301
1995,ALFA ROMEO,164 LS,COMPACT,3,6,M5,Z,13.8,9,20,31,2320,267
[hadoop@hc2nn ~]$ hdfs dfs -cat /data/talend/rdbms/rawprices.txt | head -10
ACURA,INTEGRA,44284
ACURA,INTEGRA,44284
ACURA,INTEGRA GS-R,44284
ACURA,LEGEND,44284
ACURA,LEGEND COUPE,44284
ACURA,LEGEND COUPE,44284
ACURA,NSX,32835
ACURA,NSX,32835
ACURA,2.5TL,44284
ACURA,3.2TL,44284
For my example, I plan to use only columns 2 and 3 from the first file, which contain the manufacturer and model
details, and the price information from the second file. (Note that these prices are test data, not real prices.)
 
Search WWH ::




Custom Search