Database Reference
In-Depth Information
MaxTemperatureDriver
implements the
Tool
interface, so we get the benefit of be-
ing able to set the options that
GenericOptionsParser
supports. The
run()
meth-
od constructs a
Job
object based on the tool's configuration, which it uses to launch a job.
Among the possible job configuration parameters, we set the input and output file paths;
the mapper, reducer, and combiner classes; and the output types (the input types are de-
termined by the input format, which defaults to
TextInputFormat
and has
LongWritable
keys and
Text
values). It's also a good idea to set a name for the job
(
Max temperature
) so that you can pick it out in the job list during execution and
after it has completed. By default, the name is the name of the JAR file, which normally is
not particularly descriptive.
Now we can run this application against some local files. Hadoop comes with a local job
runner, a cut-down version of the MapReduce execution engine for running MapReduce
jobs in a single JVM. It's designed for testing and is very convenient for use in an IDE,
since you can run it in a debugger to step through the code in your mapper and reducer.
The local job runner is used if
mapreduce.framework.name
is set to
local
, which
From the command line, we can run the driver by typing:
%
mvn compile
%
export HADOOP_CLASSPATH=target/classes/
%
hadoop v2.MaxTemperatureDriver -conf conf/hadoop-local.xml \
input/ncdc/micro output
Equivalently, we could use the
-fs
and
-jt
options provided by
GenericOption-
sParser
:
%
hadoop v2.MaxTemperatureDriver -fs file:/// -jt local input/ncdc/
micro output
This command executes
MaxTemperatureDriver
using input from the local
input/
ncdc/micro
directory, producing output in the local
output
directory. Note that although
we've set
-fs
so we use the local filesystem (
file:///
), the local job runner will actu-
ally work fine against any filesystem, including HDFS (and it can be handy to do this if
you have a few files that are on HDFS).
We can examine the output on the local filesystem:
%
cat output/part-r-00000
1949 111
1950 22