MapReduce with Cassandra - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

RowKey: Hazem Saleh

=> (name=count, value=334, timestamp=1407569960798)

-------------------

RowKey: cessprin

=> (name=count, value=334, timestamp=1407569960990)

-------------------

RowKey: Thu Feb 23 00:24:16 IST 2012

=> (name=count, value=334, timestamp=1407569960966)

-------------------

RowKey: Hunter Scott

=> (name=count, value=334, timestamp=1407569960803)

22 Rows Returned.

The sample output of the stored tweetcount includes a total of 22 rows. The row

key is either tweet date or user name and contains a column with the name as count and

its value.

The CQL3 Way

In previous chapters we discussed the differences between and interoperability issues

with CQL3 and Thrift. Cassandra provides support for CQL-compatible input and out-

put format classes for MapReduce. Please note that these implementations are still

based on Thrift but not the native CQL3 driver. In this example, we will be using

CQLOutputFormat for writing output in the CQL3 column family. We know that

column families created via CQL3 are not visible with Thrift, so let's explore how we

can run the MapReduce over CQL3 table/column families. Running the preceding

MapReduce program with CQL3 requires very few changes. We need to define the

table in CQL3 format and change the Hadoop job configuration to point to the

CQL3-based output format. Let's discuss these changes as follows:

1.

First, you need to create the table tweetcount :

create table tweetcount_cql (key text primary

key, count int);

2.

Changes required at the Hadoop job level are:

Search WWH ::

Custom Search

Home