Database Reference
In-Depth Information
MaxTemperatureDriver implements the Tool interface, so we get the benefit of be-
ing able to set the options that GenericOptionsParser supports. The run() meth-
od constructs a Job object based on the tool's configuration, which it uses to launch a job.
Among the possible job configuration parameters, we set the input and output file paths;
the mapper, reducer, and combiner classes; and the output types (the input types are de-
termined by the input format, which defaults to TextInputFormat and has
LongWritable keys and Text values). It's also a good idea to set a name for the job
( Max temperature ) so that you can pick it out in the job list during execution and
after it has completed. By default, the name is the name of the JAR file, which normally is
not particularly descriptive.
Now we can run this application against some local files. Hadoop comes with a local job
runner, a cut-down version of the MapReduce execution engine for running MapReduce
jobs in a single JVM. It's designed for testing and is very convenient for use in an IDE,
since you can run it in a debugger to step through the code in your mapper and reducer.
The local job runner is used if mapreduce.framework.name is set to local , which
is the default. [ 49 ]
From the command line, we can run the driver by typing:
% mvn compile
% export HADOOP_CLASSPATH=target/classes/
% hadoop v2.MaxTemperatureDriver -conf conf/hadoop-local.xml \
input/ncdc/micro output
Equivalently, we could use the -fs and -jt options provided by GenericOption-
sParser :
% hadoop v2.MaxTemperatureDriver -fs file:/// -jt local input/ncdc/
micro output
This command executes MaxTemperatureDriver using input from the local input/
ncdc/micro directory, producing output in the local output directory. Note that although
we've set -fs so we use the local filesystem ( file:/// ), the local job runner will actu-
ally work fine against any filesystem, including HDFS (and it can be handy to do this if
you have a few files that are on HDFS).
We can examine the output on the local filesystem:
% cat output/part-r-00000
1949 111
1950 22
Search WWH ::




Custom Search