Database Reference
In-Depth Information
Installing Pig
Installing Pig is very simple, what is hard is getting it to work with Hadoop and Cassandra
nicely. To install Pig, just download the latest version of Pig and untar it as follows:
$ wget http://www.eng.lsu.edu/mirrors/apache/pig/pig-0.11.1/
pig-0.11.1.tar.gz
$ tar xvzf pig-0.11.1.tar.gz
$ ln -s pig-0.11.1 pig
Let's call this directory
$PIG_HOME
. Ideally, you should just execute
$PIG_HOME/bin/
pig
, and the Pig console should start to work given that your Cassandra and Hadoop are
up and working. Unfortunately, it does not. Documentation, at the time of writing this, is
not adequate to configure Pig. To get Pig started, you need to do the following:
1. Set Hadoop's installation directory as a
HADOOP_PREFIX
variable.
2. Add all the JAR files in Cassandra's
lib
directory to
PIG_CLASSPATH
.
3. Add
udf.import.list
to the
PIG_OPTS
Pig options variable, as follows:
export PIG_OPTS="$PIG_OPTS
-Dudf.import.list=org.apache.cassandra.hadoop.pig";
4. Set one of the Cassandra nodes' address, Cassandra RPC port, and Cassandra parti-
tioner to
PIG_INITIAL_ADDRESS
,
PIG_RPC_PORT
, and
PIG_PARTITIONER
, respectively.
You may write a simple shell script that does this for you. Here is a shell script that accom-
modates the four steps (assuming,
$CASSANDRA_HOME
points to the Cassandra installa-
tion directory).
Note
Pig 0.14, Cassandra 2.1.2, and Hadoop 2.6.0 have some classpath conflicts among each
other. Some JAR has been added and deleted to make the integration work. You may spe-
cifically want to replace all Guava libraries with Guava version 16.0. Cassandra does not
like the older version, and Hadoop fails if we have the newer version (17 onwards,
ht-