Database Reference
In-Depth Information
148 else
149 {
150 other_args.add(args[i]);
151 }
152 }
153
154 FileInputFormat.setInputPaths(conf, new Path(other_args.get(0)));
155 FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
156
157 JobClient.runJob(conf);
158 return 0;
159 }
160 /*------------------------------------------------------------------*/
161 public static void main(String[] args) throws Exception
162 {
163 int res = ToolRunner.run(new Configuration(), new WordCount(), args);
164 System.exit(res);
165 }
166
167 } /* class word count*/
Describing the Example 2 Code
Take a closer look at the code for the simpler example, given earlier. Note that line 1 defines the package name as org.
myorg and lines 6 through 11 import the Hadoop functionality for Path, configuration, I/O, Map Reduce, and utilities.
New to this second example is the cache definition, which is used to store the configurations pattern file (which will
be described later):
07 import org.apache.hadoop.filecache.DistributedCache;
Line 13 defines the main WordCount class:
13 public class WordCount extends Configured implements Tool
Meanwhile, the Map class is defined at line 17:
17 public static class Map extends MapReduceBase
18 implements Mapper < LongWritable, Text, Text, IntWritable >
This class now has a configure method defined at line 36, which offers case-sensitivity and pattern-skipping
functionality:
36 public void configure(JobConf job)
The parseSkipFile method at line 60 parses the pattern file for the pattern-skipping functionality just
mentioned. The patternsFile contains a list of patterns that should be removed from the text to be processed when
counting words:
60 private void parseSkipFile(Path patternsFile)
 
Search WWH ::




Custom Search