Database Reference
In-Depth Information
Job job = new Job ( getConf ());
job . setJarByClass ( getClass ());
FileInputFormat . addInputPath ( job , new Path ( args [ 0 ]));
FileOutputFormat . setOutputPath ( job , new Path ( args [ 1 ]));
return job . waitForCompletion ( true ) ? 0 : 1 ;
}
public static void main ( String [] args ) throws Exception {
int exitCode = ToolRunner . run ( new MinimalMapReduce (), args );
System . exit ( exitCode );
}
}
The only configuration that we set is an input path and an output path. We run it over a
subset of our weather data with the following:
% hadoop MinimalMapReduce "input/ncdc/all/190{1,2}.gz" output
We do get some output: one file named part-r-00000 in the output directory. Here's what
the first few lines look like (truncated to fit the page):
0→0029029070999991901010106004+64333+023450FM-12+000599999V0202701N01591...
0→0035029070999991902010106004+64333+023450FM-12+000599999V0201401N01181...
135→0029029070999991901010113004+64333+023450FM-12+000599999V0202901N00821...
141→0035029070999991902010113004+64333+023450FM-12+000599999V0201401N01181...
270→0029029070999991901010120004+64333+023450FM-12+000599999V0209991C00001...
282→0035029070999991902010120004+64333+023450FM-12+000599999V0201401N01391...
Each line is an integer followed by a tab character, followed by the original weather data
record. Admittedly, it's not a very useful program, but understanding how it produces its
output does provide some insight into the defaults that Hadoop uses when running
MapReduce jobs. Example 8-1 shows a program that has exactly the same effect as Min-
imalMapReduce , but explicitly sets the job settings to their defaults.
Example 8-1. A minimal MapReduce driver, with the defaults explicitly set
public class MinimalMapReduceWithDefaults extends Configured implements
Tool {
@Override
public int run ( String [] args ) throws Exception {
Job job = JobBuilder . parseInputAndOutput ( this , getConf (), args );
if ( job == null ) {
return - 1 ;
}
Search WWH ::




Custom Search