Databases Reference
In-Depth Information
}
}
Again, this is a particularly good place for a unit test. Scrubbing tokens is a likely point
where edge cases will get encountered at scale. In practice, you'd want to define even
more unit tests.
Going back to the
Main.java
module, let's see how to handle other kinds of unexpected
issues with data at scale. We'll add both a
trap
and a
checkpoint
as taps:
String
trapPath
=
args
[
4
];
String
checkPath
=
args
[
5
];
Tap
trapTap
=
new
Hfs
(
new
TextDelimited
(
true
,
"\t"
),
trapPath
);
Tap
checkTap
=
new
Hfs
(
new
TextDelimited
(
true
,
"\t"
),
checkPath
);
Next we'll modify the head of the pipe assembly for documents to incorporate a
stream
assertion
, as
Figure 3-7
shows. This uses an
AssertMatches
to define the expected pattern
for data in the input tuple stream. There could be quite a large number of documents,
so it stands to reason that some data may become corrupted. In our case, another line
has been added to the example input
data/rain.txt
to exercise the assertion and trap.
Figure 3-7. Stream assertion and failure trap