Database Reference
In-Depth Information
public class MaxTemperature {
public static void main ( String [] args ) throws Exception {
if ( args . length != 2 ) {
System . err . println ( "Usage: MaxTemperature <input path> <output
path>" );
System . exit (- 1 );
}
Job job = new Job ();
job . setJarByClass ( MaxTemperature . class );
job . setJobName ( "Max temperature" );
FileInputFormat . addInputPath ( job , new Path ( args [ 0 ]));
FileOutputFormat . setOutputPath ( job , new Path ( args [ 1 ]));
job . setMapperClass ( MaxTemperatureMapper . class );
job . setReducerClass ( MaxTemperatureReducer . class );
job . setOutputKeyClass ( Text . class );
job . setOutputValueClass ( IntWritable . class );
System . exit ( job . waitForCompletion ( true ) ? 0 : 1 );
}
}
A Job object forms the specification of the job and gives you control over how the job is
run. When we run this job on a Hadoop cluster, we will package the code into a JAR file
(which Hadoop will distribute around the cluster). Rather than explicitly specifying the
name of the JAR file, we can pass a class in the Job 's setJarByClass() method,
which Hadoop will use to locate the relevant JAR file by looking for the JAR file contain-
ing this class.
Having constructed a Job object, we specify the input and output paths. An input path is
specified by calling the static addInputPath() method on FileInputFormat , and
it can be a single file, a directory (in which case, the input forms all the files in that direct-
ory), or a file pattern. As the name suggests, addInputPath() can be called more than
once to use input from multiple paths.
The output path (of which there is only one) is specified by the static setOut-
putPath() method on FileOutputFormat . It specifies a directory where the output
files from the reduce function are written. The directory shouldn't exist before running the
job because Hadoop will complain and not run the job. This precaution is to prevent data
Search WWH ::




Custom Search