MapReduce with Cassandra - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

Configuration conf = new Configuration();

String[] otherArgs = new

GenericOptionsParser(conf,

args).getRemainingArgs();

Job job = new Job(conf, "tweet count");

job.setJarByClass(TwitterCassandraJob.class);

// mapper configuration.

job.setMapperClass(TweetMapper.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

job.setInputFormatClass(ColumnFamilyInputFormat.class);

// Reducer configuration

job.setReducerClass(TweetAggregator.class);

job.setOutputKeyClass(ByteBuffer.class);

job.setOutputValueClass(List.class);

job.setOutputFormatClass(ColumnFamilyOutputFormat.class);

4.

Next, we need to configure the MapReduce job for input family and

format configuration:

// Cassandra input column family configuration

ConfigHelper.setInputRpcPort(job.getConfiguration(),

"9160");

ConfigHelper.setInputInitialAddress(job.getConfiguration(),

"localhost");

ConfigHelper.setInputPartitioner(job.getConfiguration(),

"Murmur3Partitioner");

ConfigHelper.setInputColumnFamily(job.getConfiguration(),

KEYSPACE_NAME, INPUT_COLUMN_FAMILY);

job.setInputFormatClass(ColumnFamilyInputFormat.class);

5.

Since we need to fetch records for a specific indexed column ( user ),

we will be using the Thrift Slice API to configure the MapReduce job

with the mapper class for filtered streaming.

Search WWH ::

Custom Search

Home