Database Reference
In-Depth Information
Handling malformed data
Capturing input data that causes a problem is valuable, as we can use it in a test to check
that the mapper does the right thing. In this MRUnit test, we check that the counter is up-
dated for the malformed input:
@Test
public
void
parsesMalformedTemperature
()
throws
IOException
,
InterruptedException
{
Text value
=
new
Text
(
"0335999999433181957042302005+37950+139117SAO +0004"
+
// Year ^^^^
"RJSN V02011359003150070356999999433201957010100005+353"
);
// Temperature ^^^^^
Counters counters
=
new
Counters
();
new
MapDriver
<
LongWritable
,
Text
,
Text
,
IntWritable
>()
.
withMapper
(
new
MaxTemperatureMapper
())
.
withInput
(
new
LongWritable
(
0
),
value
)
.
withCounters
(
counters
)
.
runTest
();
Counter c
=
counters
.
findCounter
(
MaxTemperatureMapper
.
Temperature
.
MALFORMED
);
assertThat
(
c
.
getValue
(),
is
(
1L
));
}
The record that was causing the problem is of a different format than the other lines we've
seen.
Example 6-12
shows a modified program (version 4) using a parser that ignores
each line with a temperature field that does not have a leading sign (plus or minus). We've
also introduced a counter to measure the number of records that we are ignoring for this
reason.
Example 6-12. Mapper for the maximum temperature example
public class
MaxTemperatureMapper
extends
Mapper
<
LongWritable
,
Text
,
Text
,
IntWritable
> {
enum
Temperature
{
MALFORMED
}
private
NcdcRecordParser parser
=
new
NcdcRecordParser
();
@Override
public
void
map
(
LongWritable key
,
Text value
,
Context context
)
throws
IOException
,
InterruptedException
{
parser
.
parse
(
value
);