Database Reference
In-Depth Information
Remember, if you plan to access MySQL with PDI, you also need to install a MySQL jar file called mysql-
connector-java-5.1.32-bin.jar from http://dev.mysql.com/downloads/connector/j/ into the PDI directory
data-integration\lib.
When PDI uses the big data plug-in, it copies libraries and configuration files to a directory called /opt/pentaho
on HDFS. Therefore, you need to make sure the user account you're using for PDI has the correct permissions. For my
example, I was running PDI from a client Windows machine which employed the user ID from the Windows session
to access HDFS. I received the following error message:
2014/09/30 18:26:20 - Pentaho MapReduce 2 - Installing Kettle to /opt/pentaho/mapreduce/5.1.0.0-
5.1.0.0-752-cdh42
2014/09/30 18:26:28 - Pentaho MapReduce 2 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57
by buildguy) : Kettle installation failed
2014/09/30 18:26:28 - Pentaho MapReduce 2 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57
by buildguy) : org.apache.hadoop.security.AccessControlException:
Permission denied: user=mikejf12, access=WRITE, inode="/":hdfs:hadoop:drwxr-xr-x
The error was caused because the Windows account (mikejf12) did not have directory access on HDFS. You
can resolve this type of problem by using the HDFS chown and chmod commands to grant access on HDFS as the
commands below show:
[hadoop@hc2nn ~]$ hdfs dfs -chown mikejf12 /opt/pentaho
[hadoop@hc2nn ~]$ hdfs dfs -chmod 777 /opt/pentaho
[hadoop@hc2nn ~]$ hdfs dfs -ls /opt
Found 1 items
drwxrwxrwx - mikejf12 hadoop 0 2014-10-25 16:02 /opt/pentaho
Unfortunately, deadlines prevented me from resolving an error that occurred on my Linux CDH 4.6 cluster when
I tried to run a PDI Map Reduce job. I knew that it was not the fault of PDI, but in fact was a configuration problem
with the cluster, probably YARN. Here's the error message I received:
2014/10/01 18:08:56 - Pentaho MapReduce 2 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57
by buildguy) : Unknown rpc kind RPC_WRITABLE
2014/10/01 18:08:56 - Pentaho MapReduce 2 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-
02-57 by buildguy) : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind
RPC_WRITABLE
This is a running cluster, but it is not quite configured in the way that PDI needs. If a cluster is configured with
CDH5 manager, then it seems to work, so the difference between the two configurations must hold the clue to the
solution.
The following error occurred when I tried to run the example PDI application on Centos Linux:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x80a3812b, pid=4480, tid=3078466416
I resolved it by stopping the application from showing the welcome page at startup. To do so, I simply added the
following line to the file $HOME/.kettle/.spoonrc of the user running PDI:
ShowWelcomePageOnStartup=N
Search WWH ::




Custom Search