Database Reference
In-Depth Information
Testing the Driver
Apart from the flexible configuration options offered by making your application imple-
ment
Tool
, you also make it more testable because it allows you to inject an arbitrary
Configuration
. You can take advantage of this to write a test that uses a local job
runner to run a job against known input data, which checks that the output is as expected.
There are two approaches to doing this. The first is to use the local job runner and run the
job against a test file on the local filesystem. The code in
Example 6-11
gives an idea of
how to do this.
Example 6-11. A test for MaxTemperatureDriver that uses a local, in-process job runner
@Test
public
void
test
()
throws
Exception
{
Configuration conf
=
new
Configuration
();
conf
.
set
(
"fs.defaultFS"
,
"file:///"
);
conf
.
set
(
"mapreduce.framework.name"
,
"local"
);
conf
.
setInt
(
"mapreduce.task.io.sort.mb"
,
1
);
Path input
=
new
Path
(
"input/ncdc/micro"
);
Path output
=
new
Path
(
"output"
);
FileSystem fs
=
FileSystem
.
getLocal
(
conf
);
fs
.
delete
(
output
,
true
);
// delete old output
MaxTemperatureDriver driver
=
new
MaxTemperatureDriver
();
driver
.
setConf
(
conf
);
int
exitCode
=
driver
.
run
(
new
String
[] {
input
.
toString
(),
output
.
toString
() });
assertThat
(
exitCode
,
is
(
0
));
checkOutput
(
conf
,
output
);
}
The test explicitly sets
fs.defaultFS
and
mapreduce.framework.name
so it
uses the local filesystem and the local job runner. It then runs the
MaxTemperat-
ureDriver
via its
Tool
interface against a small amount of known data. At the end of
the test, the
checkOutput()
method is called to compare the actual output with the ex-
pected output, line by line.
The second way of testing the driver is to run it using a “mini-” cluster. Hadoop has a set
of testing classes, called
MiniDFSCluster
,
MiniMRCluster
, and
Min-
iYARNCluster
, that provide a programmatic way of creating in-process clusters. Un-