Database Reference
In-Depth Information
In this section, we will create a Cassandra table to store tweets and use Apache Pig to
load tweets in Cassandra.
1.
First let's create a keyspace and table in Cassandra as follows:
create keyspace twitter with replication =
{'class':'SimpleStrategy','replication_factor':1};
use twitter;
create table twitterdata(screen_name text
primary key, tweetdate text, body text);
2.
Start Pig in local mode and load tweets:
tweets = LOAD '/home/vivek/tweets' USING
PigStorage('\ua001') as
(date:chararray,screen_name:chararray,body:chararray);
You may need to change the directory path as per your settings.
3.
Register apache-cassandra-2.0.4.jar :
register '$CASSANDRA_HOME/lib/
apache-cassandra-2.0.4.jar';
define CqlStorage
org.apache.cassandra.hadoop.pig.CqlStorage();
4.
Generate a tuple using TOTUPLE :
data_to = FOREACH tweets GENERATE
TOTUPLE(TOTUPLE('screen_name',screen_name)),TOTUPLE(TOTUPLE('tweetdate',date),
body);
5.
Finally, store the generated tuples in Cassandra using the CqlStor-
age function:
STORE data_to INTO 'cql://twitter/
twitterdata?output_query=update twitterdata
set tweetdate %3D%3F,body %3D%3F' USING
CqlStorage();
Search WWH ::




Custom Search