Database Reference
In-Depth Information
Clicking the Launch button executes the job and produces the basic-level log output that is shown in Figure 10-23 .
Figure 10-23. Results of job run
I have also monitored this job via my Hadoop Resource Manager interface on the URL http://hc2nn.semtech-
solutions.co.nz:8088/cluster/apps . This URL allows me to watch the job's progress until it is finished and monitor
log files, if necessary. As I know that the job has finished, there must be an existing part file under the results directory
that contains the results data. To see that output, I run this command from the Linux hadoop account:
[hadoop@hc2nn ~]$ hdfs dfs -cat /data/pentaho/result/part-00000 | head -10
ACURA-1.6 EL 2
ACURA-1.6EL 6
ACURA-1.7EL 12
ACURA-2.2CL 2
ACURA-2.3 CL 2
ACURA-2.3CL 2
ACURA-2.5TL 3
ACURA-3.0 CL 1
ACURA-3.0CL 2
ACURA-3.2 TL 1
I use the Hadoop file system cat command to dump the contents of the HDFS-based results part file, and then
the Linux head command to limit the output to the first 10 rows. What I see, then, is a summed list of vehicle makes
and models.
PDI's visual interface makes it possible for even inexperienced Hadoop users to create and schedule Map Reduce
jobs. You don't need to know Map Reduce programming and can work on client development machines. Simply by
selecting graphical functional icons, plugging them together, and configuring them, you can create complex ETL chains.
Potential Errors
Nothing in life goes perfectly, so let's addresses some errors you may encounter during a job creation.
For instance, while working on the example just given, I discovered that a MySQL connector jar file had not been
installed into the PDI library directory when I tried to connect PDI to MySQL. I received the following error message:
Driver class 'org.gjt.mm.mysql.Driver' could not be found, make sure the 'MySQL' driver (jar file)
is installed.
 
Search WWH ::




Custom Search