Database Reference
In-Depth Information
ConfigHelper.setOutputColumnFamily(conf, Setup.KEYSPACE,
Setup.OUTPUT_CF);
// set output class types
job.setOutputKeyClass(ByteBuffer.class);
job.setOutputValueClass(List.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// verbose
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception{
ToolRunner.run(new Configuration(), new
CassandraWordCount(), args);
System.exit(0);
}
}
All right, we went through lots of things, but nothing that we do not know about. Starting
from the main method, we provide an instance of our main class and any parameter that
is passed from the CLI. We kick off ToolRunner . The ToolRunner class executes the
run method, where all the settings for environment and Cassandra are. We also can tell
where our Mapper and producer for this job are.
We tell Hadoop how to pull data from Cassandra by providing SlicePredicate
where we pull a complete row by not setting the start column name, the end column name,
and setting the count to 2 billion. One may want to modify and just set wide row to true
and achieve the same without worrying about SlicePredicate .
If you are planning to use CQL input and output with Hadoop, the configuration looks
like this:
Search WWH ::




Custom Search