Database Reference
In-Depth Information
// create column family tweetcount.
create column family tweetcount with
comparator='UTF8Type' and
key_validation_class = 'UTF8Type' and
column_metadata=[{column_name:'count',
validation_class:'Int32Type'}];
2.
Let's create a MapReduce job. We need to create a Hadoop configura-
tion instance and configure the NameNode host and port:
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
// Change this as per your Hadoop
configuration.
conf.set("mapred.child.java.opts", "-Xms1024m
-Xmx2g -XX:+UseSerialGC");
conf.set("mapred.job.map.memory.mb", "4096");
conf.set("mapred.job.reduce.memory.mb",
"2048");
conf.set("mapreduce.map.ulimit","1048576");
conf.set("mapred.job.reduce.physical.mb",
"2048");
conf.set("mapred.job.map.physical.mb",
"2048");
Note You may want to change fs.default.name if you're
running on a remote machine.
3.
Let's now configure the MapReduce job for the mapper and reducer:
//Mapper configuration
job.setMapperClass(TweetTokenizer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// Reducer configuration
job.setReducerClass(TweetAggregator.class);
Search WWH ::




Custom Search