Database Reference
In-Depth Information
Job job
=
new
Job
(
getConf
());
job
.
setJarByClass
(
getClass
());
FileInputFormat
.
addInputPath
(
job
,
new
Path
(
args
[
0
]));
FileOutputFormat
.
setOutputPath
(
job
,
new
Path
(
args
[
1
]));
return
job
.
waitForCompletion
(
true
) ?
0
:
1
;
}
public static
void
main
(
String
[]
args
)
throws
Exception
{
int
exitCode
=
ToolRunner
.
run
(
new
MinimalMapReduce
(),
args
);
System
.
exit
(
exitCode
);
}
}
The only configuration that we set is an input path and an output path. We run it over a
subset of our weather data with the following:
%
hadoop MinimalMapReduce "input/ncdc/all/190{1,2}.gz" output
We do get some output: one file named
part-r-00000
in the output directory. Here's what
the first few lines look like (truncated to fit the page):
0→0029029070999991901010106004+64333+023450FM-12+000599999V0202701N01591...
0→0035029070999991902010106004+64333+023450FM-12+000599999V0201401N01181...
135→0029029070999991901010113004+64333+023450FM-12+000599999V0202901N00821...
141→0035029070999991902010113004+64333+023450FM-12+000599999V0201401N01181...
270→0029029070999991901010120004+64333+023450FM-12+000599999V0209991C00001...
282→0035029070999991902010120004+64333+023450FM-12+000599999V0201401N01391...
Each line is an integer followed by a tab character, followed by the original weather data
record. Admittedly, it's not a very useful program, but understanding how it produces its
output does provide some insight into the defaults that Hadoop uses when running
MapReduce jobs.
Example 8-1
shows a program that has exactly the same effect as
Min-
imalMapReduce
, but explicitly sets the job settings to their defaults.
Example 8-1. A minimal MapReduce driver, with the defaults explicitly set
public class
MinimalMapReduceWithDefaults
extends
Configured
implements
Tool
{
@Override
public
int
run
(
String
[]
args
)
throws
Exception
{
Job job
=
JobBuilder
.
parseInputAndOutput
(
this
,
getConf
(),
args
);
if
(
job
==
null
) {
return
-
1
;
}