Database Reference
In-Depth Information
Cassandra with Hadoop MapReduce
Cassandra provides built-in support for Hadoop. If you have ever written a MapReduce
program, you will find out that writing a MapReduce task with Cassandra is quite similar to
how one would write a MapReduce task for the data stored in HDFS. Cassandra supports
input to Hadoop with
ColumnFamilyInputFormat
and output with the
Colum-
nFamilyOutputFormat
classes, respectively. Apart from these, you will need to put
Cassandra-specific settings for Hadoop via
ConfigHelper
. These three classes are
enough to get you started. Another class that might be worth looking at is
BulkOut-
putFormat
. All these classes are under the
org.apache.cassandra.hadoop.*
package.
To be able to compile the MapReduce code that uses Cassandra as data source or data sink,
you must have
cassandra-all.jar
in your classpath. You will also need to make Ha-
doop to be able to see JARs in the Cassandra library. We will discuss this later in this
chapter.
Let's understand the classes that we will be using to get Cassandra working for our MapRe-
duce problem.