Database Reference
In-Depth Information
public class
MaxTemperature
{
public static
void
main
(
String
[]
args
)
throws
Exception
{
if
(
args
.
length
!=
2
) {
System
.
err
.
println
(
"Usage: MaxTemperature <input path> <output
path>"
);
System
.
exit
(-
1
);
}
Job job
=
new
Job
();
job
.
setJarByClass
(
MaxTemperature
.
class
);
job
.
setJobName
(
"Max temperature"
);
FileInputFormat
.
addInputPath
(
job
,
new
Path
(
args
[
0
]));
FileOutputFormat
.
setOutputPath
(
job
,
new
Path
(
args
[
1
]));
job
.
setMapperClass
(
MaxTemperatureMapper
.
class
);
job
.
setReducerClass
(
MaxTemperatureReducer
.
class
);
job
.
setOutputKeyClass
(
Text
.
class
);
job
.
setOutputValueClass
(
IntWritable
.
class
);
System
.
exit
(
job
.
waitForCompletion
(
true
) ?
0
:
1
);
}
}
A
Job
object forms the specification of the job and gives you control over how the job is
run. When we run this job on a Hadoop cluster, we will package the code into a JAR file
(which Hadoop will distribute around the cluster). Rather than explicitly specifying the
name of the JAR file, we can pass a class in the
Job
's
setJarByClass()
method,
which Hadoop will use to locate the relevant JAR file by looking for the JAR file contain-
ing this class.
Having constructed a
Job
object, we specify the input and output paths. An input path is
specified by calling the static
addInputPath()
method on
FileInputFormat
, and
it can be a single file, a directory (in which case, the input forms all the files in that direct-
ory), or a file pattern. As the name suggests,
addInputPath()
can be called more than
once to use input from multiple paths.
The output path (of which there is only one) is specified by the static
setOut-
putPath()
method on
FileOutputFormat
. It specifies a directory where the output
files from the reduce function are written. The directory shouldn't exist before running the
job because Hadoop will complain and not run the job. This precaution is to prevent data