Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

-------+-------

Alice | 377

cqlsh:testks> select * from resultCF where key = 'Hatter';

KEY | count

--------+-------

Hatter | 54

cqlsh:testks> select * from resultCF where key = 'Cat';

KEY | count

-----+-------

Cat | 23

There is a small difference in counting of the words, but that's likely due to the split that I

use and the split function that Pig uses.

Note that the Pig Latin that we have used here may be very inefficient. The purpose of this

example is to show the Cassandra and Pig integration. To learn about Pig Latin, look at

the Pig documentation. Reading Apache Pig's official tutorial ( http://pig.apache.org/docs/

r0.11.1/start.html#tutorial ) is recommended to know more about it.

You may also want to use CQL with Pig. You will have to use CqlStorage (with some

versions, CqlStorage may not work so try using CqlNativeStorage ), a word

count example looks as follows:

grunt> alice = LOAD 'cql://hadoop_test/lines' USING

CqlStorage();

grunt> B = foreach alice generate

flatten(TOKENIZE((chararray)$0)) as word;

grunt> C = group B by word;

grunt> D = foreach C generate COUNT(B) as word_count, group

as word;

grunt> E = FOREACH D GENERATE

TOTUPLE(TOTUPLE('word',word)),TOTUPLE('word_count',

word_count);

grunt> STORE E INTO 'cql://hadoop_test/

output?output_query=UPDATE%20hadoop_test.output%20SET%20word_count%20%3D%20%3F'

USING CqlStorage();

Search WWH ::

Custom Search

Home