Database Reference
In-Depth Information
Configuration conf = new Configuration();
String[] otherArgs = new
GenericOptionsParser(conf,
args).getRemainingArgs();
Job job = new Job(conf, "tweet count");
job.setJarByClass(TwitterCassandraJob.class);
// mapper configuration.
job.setMapperClass(TweetMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
// Reducer configuration
job.setReducerClass(TweetAggregator.class);
job.setOutputKeyClass(ByteBuffer.class);
job.setOutputValueClass(List.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
4.
Next, we need to configure the MapReduce job for input family and
format configuration:
// Cassandra input column family configuration
ConfigHelper.setInputRpcPort(job.getConfiguration(),
"9160");
ConfigHelper.setInputInitialAddress(job.getConfiguration(),
"localhost");
ConfigHelper.setInputPartitioner(job.getConfiguration(),
"Murmur3Partitioner");
ConfigHelper.setInputColumnFamily(job.getConfiguration(),
KEYSPACE_NAME, INPUT_COLUMN_FAMILY);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
5.
Since we need to fetch records for a specific indexed column ( user ),
we will be using the Thrift Slice API to configure the MapReduce job
with the mapper class for filtered streaming.
Search WWH ::




Custom Search