Database Reference
In-Depth Information
Next, I click the Job Setup tab to specify the input and output paths for the job data, a shown in Figure
10-19
. The
input data file is stored on HDFS at /data/pentaho/rdbms, as explained earlier.
Figure 10-19.
Job Setup tab for job pmr1
The input and output data formats for this job are defined as Hadoop Map Reduce based Java classes, such as
org.apache.hadoop.mapred.TextOutputFormat
. The Clean option is selected so that the job can be rerun. That is,
each time the job runs, it will clean out the results directory.
Lastly, I define the connection to the Hadoop cluster using the Cluster tab. As you can see in Figure
10-20
, the
only fields that I have changed in this tab are the hostnames and ports, so that Pentaho knows which hosts to connect
to (hc2nn) for HDFS and Map Reduce. I have also specified the ports, 8020 for HDFS and 8032 for the Resource
Manager (which is actually labeled as the Job Tracker, but this is a CDH5 cluster using YARN).
Search WWH ::
Custom Search