Database Reference
In-Depth Information
sum = sum + value.get();
}
Column col = new Column();
col.setName(ByteBufferUtil.bytes("count"));
col.setValue(ByteBufferUtil.bytes(sum));
col.setTimestamp(System.currentTimeMillis());
Mutation mutation = new Mutation();
mutation.setColumn_or_supercolumn(new
ColumnOrSuperColumn());
mutation.getColumn_or_supercolumn().setColumn(col);
context.write(
ByteBufferUtil.bytes(key.toString()),
Collections.singletonList(mutation)
);
}
}
Reducer is a little more interesting than Mapper. This is because we are doing two things.
Firstly, we are counting the number of grouped elements that come to Reducer from our
Mapper. We know that it is grouped by word. So, at the end of looping through the values,
we will get the number of instances of that word. The second thing that we are doing here
is storing this value in Cassandra. Instead of outputting the result to HDFS, we store it in
Cassandra with the row key as word , and we add a column named count that will hold
the value that we just obtained in the previous step. You can see that there is no
environment-specific configuration done here. We instruct what to store in Cassandra and
how, and we are done. So, the question arises, where do we set all the environment-specif-
ic and Cassandra-specific things? The answer is in the main method. Here is how the
main method for this particular example looks. Alternatively, in any Cassandra-based
Hadoop project, it will not vary much:
public class CassandraWordCount extends Configured
implements Tool {
[-- snip --]
public int run(String[] args) throws Exception {
Job job = new Job(getConf(), "cassandrawordcount");
Search WWH ::




Custom Search