Database Reference
In-Depth Information
Cassandra and Hadoop in action
Now, with more than enough (rather boring) theory, we are ready to do something exciting.
In this section, we will do a word count of a topic. It will be more interesting than the
grep example.
In this example, we load Lewis Carroll's novel Alice in Wonderland ( ht-
tp://en.wikipedia.org/wiki/Alice%27s_Adventures_in_Wonderland ) in Cassandra. To pre-
pare this data, we read the text file line by line and store 500 lines in one row. The row
names are formatted as row_1 , row_2 , and so on, and the columns in each row have
names such as col_1 , col_2 , and so on. Each row has almost 500 columns, and each
column has one line from the file. To avoid noise, we have removed the punctuation from
the lines during the load. We could certainly work on the noise reduction in the MapReduce
code, but we wanted to keep it simple. What follows is the code and its explanation. It is
recommended to download the code either from my GitHub account or from the topic's
website. Keep it handy while reading this chapter. The code is eventually compiled and
submitted to Hadoop MapReduce to execute the compiled JAR file. We use the mvn
clean install Maven command to compile and create a JAR file. If you are unaware
of Maven or new to Java, you can compile the files using appropriate dependencies or JAR
files in the classpath. Refer to the pom.xml file in the project to know the JAR files you
need to compile for the example in Java.
Assuming that we have data ready in Cassandra to run MapReduce on it, we will write
Mapper, Reducer, and a main method. Here is the Mapper:
public static class WordMapper extends Mapper<ByteBuffer,
SortedMap<ByteBuffer, IColumn>, Text, IntWritable>{
private static final IntWritable ONE = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(ByteBuffer key, SortedMap<ByteBuffer,
IColumn> cols, Context context)
throwsIOException, InterruptedException {
//Iterate through the column values
for(IColumn col: cols.values()){
String val = ByteBufferUtil.string(col.value());
Search WWH ::




Custom Search