Database Reference
In-Depth Information
ColumnFamilyRecordReader
ColumnFamilyRecordReader
The layer at which individual records from Cassandra are read. It's an extension of Hadoop's
RecordReader abstract class.
There are similar classes for outputting data to Cassandra in the Hadoop package, but at the time
of this writing, those classes are still being finalized.
Running the Word Count Example
Word count is one of the examples given in the MapReduce paper and is the starting point for
many who are new to the framework. It takes a body of text and counts the occurrences of each
distinct word. Here we provide some code to perform a word count over data contained in Cas-
sandra. A working example of word count is also included in the Cassandra source download.
First we need a Mapper class, shown in Example 12-1 .
Example12-1.The TokenizerMapper.java class
public static class TokenizerMapper extends Mapper<byte[],
SortedMap<byte[], IColumn>, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private String columnName;
public void map(byte[] key, SortedMap<byte[], IColumn> columns, Context context)
throws IOException, InterruptedException {
IColumn column = columns.get(columnName.getBytes());
String value = new String(column.value());
StringTokenizer itr = new StringTokenizer(value);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
protected void setup(Context context)
throws IOException, InterruptedException {
this.columnName = context.getConfiguration().get(“column_name”);
}
}
Search WWH ::




Custom Search