Database Reference
In-Depth Information
Next, I click the Job Setup tab to specify the input and output paths for the job data, a shown in Figure 10-19 . The
input data file is stored on HDFS at /data/pentaho/rdbms, as explained earlier.
Figure 10-19. Job Setup tab for job pmr1
The input and output data formats for this job are defined as Hadoop Map Reduce based Java classes, such as
org.apache.hadoop.mapred.TextOutputFormat . The Clean option is selected so that the job can be rerun. That is,
each time the job runs, it will clean out the results directory.
Lastly, I define the connection to the Hadoop cluster using the Cluster tab. As you can see in Figure 10-20 , the
only fields that I have changed in this tab are the hostnames and ports, so that Pentaho knows which hosts to connect
to (hc2nn) for HDFS and Map Reduce. I have also specified the ports, 8020 for HDFS and 8032 for the Resource
Manager (which is actually labeled as the Job Tracker, but this is a CDH5 cluster using YARN).
 
Search WWH ::




Custom Search