Databases Reference
In-Depth Information
}
}
Again, this is a particularly good place for a unit test. Scrubbing tokens is a likely point
where edge cases will get encountered at scale. In practice, you'd want to define even
more unit tests.
Going back to the Main.java module, let's see how to handle other kinds of unexpected
issues with data at scale. We'll add both a trap and a checkpoint as taps:
String trapPath = args [ 4 ];
String checkPath = args [ 5 ];
Tap trapTap = new Hfs ( new TextDelimited ( true , "\t" ), trapPath );
Tap checkTap = new Hfs ( new TextDelimited ( true , "\t" ), checkPath );
Next we'll modify the head of the pipe assembly for documents to incorporate a stream
assertion , as Figure 3-7 shows. This uses an AssertMatches to define the expected pattern
for data in the input tuple stream. There could be quite a large number of documents,
so it stands to reason that some data may become corrupted. In our case, another line
has been added to the example input data/rain.txt to exercise the assertion and trap.
Figure 3-7. Stream assertion and failure trap
 
Search WWH ::




Custom Search