Database Reference
In-Depth Information
Testing the Driver
Apart from the flexible configuration options offered by making your application imple-
ment Tool , you also make it more testable because it allows you to inject an arbitrary
Configuration . You can take advantage of this to write a test that uses a local job
runner to run a job against known input data, which checks that the output is as expected.
There are two approaches to doing this. The first is to use the local job runner and run the
job against a test file on the local filesystem. The code in Example 6-11 gives an idea of
how to do this.
Example 6-11. A test for MaxTemperatureDriver that uses a local, in-process job runner
@Test
public void test () throws Exception {
Configuration conf = new Configuration ();
conf . set ( "fs.defaultFS" , "file:///" );
conf . set ( "mapreduce.framework.name" , "local" );
conf . setInt ( "mapreduce.task.io.sort.mb" , 1 );
Path input = new Path ( "input/ncdc/micro" );
Path output = new Path ( "output" );
FileSystem fs = FileSystem . getLocal ( conf );
fs . delete ( output , true ); // delete old output
MaxTemperatureDriver driver = new MaxTemperatureDriver ();
driver . setConf ( conf );
int exitCode = driver . run ( new String [] {
input . toString (), output . toString () });
assertThat ( exitCode , is ( 0 ));
checkOutput ( conf , output );
}
The test explicitly sets fs.defaultFS and mapreduce.framework.name so it
uses the local filesystem and the local job runner. It then runs the MaxTemperat-
ureDriver via its Tool interface against a small amount of known data. At the end of
the test, the checkOutput() method is called to compare the actual output with the ex-
pected output, line by line.
The second way of testing the driver is to run it using a “mini-” cluster. Hadoop has a set
of testing classes, called MiniDFSCluster , MiniMRCluster , and Min-
iYARNCluster , that provide a programmatic way of creating in-process clusters. Un-
Search WWH ::




Custom Search