Database Reference
In-Depth Information
job
.
setInputFormatClass
(
TextInputFormat
.
class
);
job
.
setMapperClass
(
Mapper
.
class
);
job
.
setMapOutputKeyClass
(
LongWritable
.
class
);
job
.
setMapOutputValueClass
(
Text
.
class
);
job
.
setPartitionerClass
(
HashPartitioner
.
class
);
job
.
setNumReduceTasks
(
1
);
job
.
setReducerClass
(
Reducer
.
class
);
job
.
setOutputKeyClass
(
LongWritable
.
class
);
job
.
setOutputValueClass
(
Text
.
class
);
job
.
setOutputFormatClass
(
TextOutputFormat
.
class
);
return
job
.
waitForCompletion
(
true
) ?
0
:
1
;
}
public static
void
main
(
String
[]
args
)
throws
Exception
{
int
exitCode
=
ToolRunner
.
run
(
new
MinimalMapReduceWithDefaults
(),
args
);
System
.
exit
(
exitCode
);
}
}
We've simplified the first few lines of the
run()
method by extracting the logic for
printing usage and setting the input and output paths into a helper method. Almost all
MapReduce drivers take these two arguments (input and output), so reducing the boiler-
plate code here is a good thing. Here are the relevant methods in the
JobBuilder
class
for reference:
public static
Job
parseInputAndOutput
(
Tool tool
,
Configuration
conf
,
String
[]
args
)
throws
IOException
{
if
(
args
.
length
!=
2
) {
printUsage
(
tool
,
"<input> <output>"
);
return null
;
}
Job job
=
new
Job
(
conf
);
job
.
setJarByClass
(
tool
.
getClass
());
FileInputFormat
.
addInputPath
(
job
,
new
Path
(
args
[
0
]));
FileOutputFormat
.
setOutputPath
(
job
,
new
Path
(
args
[
1
]));
return
job
;