Database Reference
In-Depth Information
Installing Pig
Installing Pig is very simple, what is hard is getting it to work with Hadoop and Cassandra
nicely. To install Pig, just download the latest version of Pig and untar it as follows:
$ wget http://www.eng.lsu.edu/mirrors/apache/pig/pig-0.11.1/
pig-0.11.1.tar.gz
$ tar xvzf pig-0.11.1.tar.gz
$ ln -s pig-0.11.1 pig
Let's call this directory $PIG_HOME . Ideally, you should just execute $PIG_HOME/bin/
pig , and the Pig console should start to work given that your Cassandra and Hadoop are
up and working. Unfortunately, it does not. Documentation, at the time of writing this, is
not adequate to configure Pig. To get Pig started, you need to do the following:
1. Set Hadoop's installation directory as a HADOOP_PREFIX variable.
2. Add all the JAR files in Cassandra's lib directory to PIG_CLASSPATH .
3. Add udf.import.list to the PIG_OPTS Pig options variable, as follows:
export PIG_OPTS="$PIG_OPTS
-Dudf.import.list=org.apache.cassandra.hadoop.pig";
4. Set one of the Cassandra nodes' address, Cassandra RPC port, and Cassandra parti-
tioner to PIG_INITIAL_ADDRESS , PIG_RPC_PORT , and
PIG_PARTITIONER , respectively.
You may write a simple shell script that does this for you. Here is a shell script that accom-
modates the four steps (assuming, $CASSANDRA_HOME points to the Cassandra installa-
tion directory).
Note
Pig 0.14, Cassandra 2.1.2, and Hadoop 2.6.0 have some classpath conflicts among each
other. Some JAR has been added and deleted to make the integration work. You may spe-
cifically want to replace all Guava libraries with Guava version 16.0. Cassandra does not
like the older version, and Hadoop fails if we have the newer version (17 onwards, ht-
tps://issues.apache.org/jira/browse/HADOOP-11032 ) .
Search WWH ::




Custom Search