Integrating Hadoop - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

Readers familiar with MapReduce programs will notice how familiar this mapper looks. In this

case, the inputs to the mapper are row keys and associated row values from Cassandra. Row val-

ues in the world of Cassandra are simply maps containing the column information. In addition to

the word count code itself, we override the setup method to set the column name we are looking

for. The rest of the mapper code is generic to any word count implementation.

NOTE

When iterating over super columns in your mapper, each IColumn would need to be cast to a Super-

Column , and it would contain nested column information.

Next, let's look at a Reducer implementation for our word count, shown in Example 12-2 .

Example12-2.The Reducer implementation

public static class IntSumReducer extends

Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

There should be nothing surprising in this reducer; nothing is Cassandra-specific.

Finally, we get to the class that runs our MapReduce program, shown in Example 12-3 .

Example12-3.The WordCount class runs the MapReduce program

public class WordCount extends Configured implements Tool {

public int run(String[] args) throws Exception {

Job job = new Job(getConf(), “wordcount”);

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

Search WWH ::

Custom Search

Home