Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

sum = sum + value.get();

}

Column col = new Column();

col.setName(ByteBufferUtil.bytes("count"));

col.setValue(ByteBufferUtil.bytes(sum));

col.setTimestamp(System.currentTimeMillis());

Mutation mutation = new Mutation();

mutation.setColumn_or_supercolumn(new

ColumnOrSuperColumn());

mutation.getColumn_or_supercolumn().setColumn(col);

context.write(

ByteBufferUtil.bytes(key.toString()),

Collections.singletonList(mutation)

);

}

Reducer is a little more interesting than Mapper. This is because we are doing two things.

Firstly, we are counting the number of grouped elements that come to Reducer from our

Mapper. We know that it is grouped by word. So, at the end of looping through the values,

we will get the number of instances of that word. The second thing that we are doing here

is storing this value in Cassandra. Instead of outputting the result to HDFS, we store it in

Cassandra with the row key as word , and we add a column named count that will hold

the value that we just obtained in the previous step. You can see that there is no

environment-specific configuration done here. We instruct what to store in Cassandra and

how, and we are done. So, the question arises, where do we set all the environment-specif-

ic and Cassandra-specific things? The answer is in the main method. Here is how the

main method for this particular example looks. Alternatively, in any Cassandra-based

Hadoop project, it will not vary much:

public class CassandraWordCount extends Configured

implements Tool {

[-- snip --]

public int run(String[] args) throws Exception {

Job job = new Job(getConf(), "cassandrawordcount");

Search WWH ::

Custom Search

Home