Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

StringTokenizertokenizer = new StringTokenizer(val);

while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());

context.write(word, ONE);

}

This is what our Mapper looks like. To a person who has some experience in writing

MapReduce programs, this does not have much deviation from a regular Mapper. Here are

a couple of things to note:

• Cassandra feeds sorted map to Mapper. This is sorted by column name and it is

basically column-name, column-value pair.

• The key is of the ByteBuffer type and it is the row key.

• Use org.apache.cassandra.utils.ByteBufferUtil to convert

ByteBuffer to meaningful types.

• If you want to process column by column, loop through the column's sorted map.

• Write out the output that you want this Mapper to forward to the Reducer. The

values that you write to context is sorted and grouped by the framework and for-

warded to the Reducer.

Now that we have done the basic task of splitting the text in each column and forwarding

it with key as word and value as ONE , in order to count each word, we need to get all the

words that were forwarded by Mapper at one place so that we can just iterate in the

grouped key-value pairs of word and ONE and update a counter until all the occurrences

of that word is taken care of. Here is how our Mapper looks:

public static class WordReducer extends Reducer<Text,

IntWritable, ByteBuffer, List<Mutation>>{

@Override

protected void reduce(Text key, Iterable<IntWritable>

values, Context context)

throwsIOException, InterruptedException {

int sum = 0;

for(IntWritable value: values){

Search WWH ::

Custom Search

Home