Database Reference
In-Depth Information
Readers familiar with MapReduce programs will notice how familiar this mapper looks. In this
case, the inputs to the mapper are row keys and associated row values from Cassandra. Row val-
ues in the world of Cassandra are simply maps containing the column information. In addition to
the word count code itself, we override the setup method to set the column name we are looking
for. The rest of the mapper code is generic to any word count implementation.
NOTE
When iterating over super columns in your mapper, each IColumn would need to be cast to a Super-
Column , and it would contain nested column information.
Next, let's look at a Reducer implementation for our word count, shown in Example 12-2 .
Example12-2.The Reducer implementation
public static class IntSumReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
There should be nothing surprising in this reducer; nothing is Cassandra-specific.
Finally, we get to the class that runs our MapReduce program, shown in Example 12-3 .
Example12-3.The WordCount class runs the MapReduce program
public class WordCount extends Configured implements Tool {
public int run(String[] args) throws Exception {
Job job = new Job(getConf(), “wordcount”);
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
Search WWH ::




Custom Search