MapReduce with Cassandra - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

// create column family tweetcount.

create column family tweetcount with

comparator='UTF8Type' and

key_validation_class = 'UTF8Type' and

column_metadata=[{column_name:'count',

validation_class:'Int32Type'}];

2.

Let's create a MapReduce job. We need to create a Hadoop configura-

tion instance and configure the NameNode host and port:

Configuration conf = new Configuration();

conf.set("fs.default.name","hdfs://localhost:9000");

// Change this as per your Hadoop

configuration.

conf.set("mapred.child.java.opts", "-Xms1024m

-Xmx2g -XX:+UseSerialGC");

conf.set("mapred.job.map.memory.mb", "4096");

conf.set("mapred.job.reduce.memory.mb",

"2048");

conf.set("mapreduce.map.ulimit","1048576");

conf.set("mapred.job.reduce.physical.mb",

"2048");

conf.set("mapred.job.map.physical.mb",

"2048");

Note You may want to change fs.default.name if you're

running on a remote machine.

3.

Let's now configure the MapReduce job for the mapper and reducer:

//Mapper configuration

job.setMapperClass(TweetTokenizer.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

// Reducer configuration

job.setReducerClass(TweetAggregator.class);

Search WWH ::

Custom Search

Home