Integrating Hadoop - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

ColumnFamilyRecordReader

The layer at which individual records from Cassandra are read. It's an extension of Hadoop's

RecordReader abstract class.

There are similar classes for outputting data to Cassandra in the Hadoop package, but at the time

of this writing, those classes are still being finalized.

Running the Word Count Example

Word count is one of the examples given in the MapReduce paper and is the starting point for

many who are new to the framework. It takes a body of text and counts the occurrences of each

distinct word. Here we provide some code to perform a word count over data contained in Cas-

sandra. A working example of word count is also included in the Cassandra source download.

First we need a Mapper class, shown in Example 12-1 .

Example12-1.The TokenizerMapper.java class

public static class TokenizerMapper extends Mapper<byte[],

SortedMap<byte[], IColumn>, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

private String columnName;

public void map(byte[] key, SortedMap<byte[], IColumn> columns, Context context)

throws IOException, InterruptedException {

IColumn column = columns.get(columnName.getBytes());

String value = new String(column.value());

StringTokenizer itr = new StringTokenizer(value);

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);

}

protected void setup(Context context)

throws IOException, InterruptedException {

this.columnName = context.getConfiguration().get(“column_name”);

}

Search WWH ::

Custom Search

Home