Database Reference
In-Depth Information
RowKey: Hazem Saleh
=> (name=count, value=334, timestamp=1407569960798)
-------------------
RowKey: cessprin
=> (name=count, value=334, timestamp=1407569960990)
-------------------
RowKey: Thu Feb 23 00:24:16 IST 2012
=> (name=count, value=334, timestamp=1407569960966)
-------------------
RowKey: Hunter Scott
=> (name=count, value=334, timestamp=1407569960803)
22 Rows Returned.
The sample output of the stored tweetcount includes a total of 22 rows. The row
key is either tweet date or user name and contains a column with the name as count and
its value.
The CQL3 Way
In previous chapters we discussed the differences between and interoperability issues
with CQL3 and Thrift. Cassandra provides support for CQL-compatible input and out-
put format classes for MapReduce. Please note that these implementations are still
based on Thrift but not the native CQL3 driver. In this example, we will be using
CQLOutputFormat for writing output in the CQL3 column family. We know that
column families created via CQL3 are not visible with Thrift, so let's explore how we
can run the MapReduce over CQL3 table/column families. Running the preceding
MapReduce program with CQL3 requires very few changes. We need to define the
table in CQL3 format and change the Hadoop job configuration to point to the
CQL3-based output format. Let's discuss these changes as follows:
1.
First, you need to create the table tweetcount :
create table tweetcount_cql (key text primary
key, count int);
2.
Changes required at the Hadoop job level are:
Search WWH ::




Custom Search