Database Reference
In-Depth Information
StringTokenizertokenizer = new StringTokenizer(val);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, ONE);
}
}
}
}
This is what our Mapper looks like. To a person who has some experience in writing
MapReduce programs, this does not have much deviation from a regular Mapper. Here are
a couple of things to note:
• Cassandra feeds sorted map to Mapper. This is sorted by column name and it is
basically column-name, column-value pair.
• The key is of the ByteBuffer type and it is the row key.
• Use org.apache.cassandra.utils.ByteBufferUtil to convert
ByteBuffer to meaningful types.
• If you want to process column by column, loop through the column's sorted map.
• Write out the output that you want this Mapper to forward to the Reducer. The
values that you write to context is sorted and grouped by the framework and for-
warded to the Reducer.
Now that we have done the basic task of splitting the text in each column and forwarding
it with key as word and value as ONE , in order to count each word, we need to get all the
words that were forwarded by Mapper at one place so that we can just iterate in the
grouped key-value pairs of word and ONE and update a counter until all the occurrences
of that word is taken care of. Here is how our Mapper looks:
public static class WordReducer extends Reducer<Text,
IntWritable, ByteBuffer, List<Mutation>>{
@Override
protected void reduce(Text key, Iterable<IntWritable>
values, Context context)
throwsIOException, InterruptedException {
int sum = 0;
for(IntWritable value: values){
Search WWH ::




Custom Search