Database Reference
In-Depth Information
Handling malformed data
Capturing input data that causes a problem is valuable, as we can use it in a test to check
that the mapper does the right thing. In this MRUnit test, we check that the counter is up-
dated for the malformed input:
@Test
public void parsesMalformedTemperature () throws IOException ,
InterruptedException {
Text value = new
Text ( "0335999999433181957042302005+37950+139117SAO +0004" +
// Year ^^^^
"RJSN V02011359003150070356999999433201957010100005+353" );
// Temperature ^^^^^
Counters counters = new Counters ();
new MapDriver < LongWritable , Text , Text , IntWritable >()
. withMapper ( new MaxTemperatureMapper ())
. withInput ( new LongWritable ( 0 ), value )
. withCounters ( counters )
. runTest ();
Counter c =
counters . findCounter ( MaxTemperatureMapper . Temperature . MALFORMED );
assertThat ( c . getValue (), is ( 1L ));
}
The record that was causing the problem is of a different format than the other lines we've
seen. Example 6-12 shows a modified program (version 4) using a parser that ignores
each line with a temperature field that does not have a leading sign (plus or minus). We've
also introduced a counter to measure the number of records that we are ignoring for this
reason.
Example 6-12. Mapper for the maximum temperature example
public class MaxTemperatureMapper
extends Mapper < LongWritable , Text , Text , IntWritable > {
enum Temperature {
MALFORMED
}
private NcdcRecordParser parser = new NcdcRecordParser ();
@Override
public void map ( LongWritable key , Text value , Context context )
throws IOException , InterruptedException {
parser . parse ( value );
Search WWH ::




Custom Search