Database Reference
In-Depth Information
ConfigHelper.setOutputColumnFamily(conf, Setup.KEYSPACE,
Setup.OUTPUT_CF);
// set output class types
job.setOutputKeyClass(ByteBuffer.class);
job.setOutputValueClass(List.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// verbose
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception{
ToolRunner.run(new Configuration(), new
CassandraWordCount(), args);
System.exit(0);
}
}
All right, we went through lots of things, but nothing that we do not know about. Starting
from the
main
method, we provide an instance of our main class and any parameter that
is passed from the CLI. We kick off
ToolRunner
. The
ToolRunner
class executes the
run
method, where all the settings for environment and Cassandra are. We also can tell
where our Mapper and producer for this job are.
We tell Hadoop how to pull data from Cassandra by providing
SlicePredicate
where we pull a complete row by not setting the start column name, the end column name,
and setting the count to 2 billion. One may want to modify and just set wide row to
true
and achieve the same without worrying about
SlicePredicate
.
If you are planning to use CQL input and output with Hadoop, the configuration looks
like this: