Information Technology Reference
In-Depth Information
that in fact even a small test suite can achieve much better code coverage than
a very large corpus.
As a reviewer pointed out, most linguistic phenomena are Zipfian in nature.
How far must we go in evaluating and handling the phenomena in the Zipfian
tail? Steedman has an insightful observation on this question:
We have come to believe that the linguists have forgotten Zipf's law,
which says that most of the variance in linguistic behavior can be cap-
tured by a small part of the system.
The linguists, on the other hand, think that it is we who have forgot-
ten Zipf's law, which also says that most of the information about the
language system as a whole is in the Long Tail.
It is we who are at fault here, because the machine learning techniques
that we rely on are actually very bad at inducing systems for which the
crucial information is in rare events.. .
One day. . . the Long Tail will come back to haunt us.
[21]
Even for work whose goal is not application-building but basic research, the
costs of failing to attend to basic software testing and quality assurance issues
can be quite severe. As Rob Knight has put it, “For scientific work, bugs don't
just mean unhappy users who you'll never actually meet: they mean retracted
publications and ended careers. It is critical that your code be fully tested before
you draw conclusions from the results it produces.” The recent case of Geoffrey
Chang (see [16] for a succinct discussion) is illustrative. In 2006, he was a star
of the protein crystallography world. That year he discovered a simple software
error in his code which led to a reversal of the sign (positive versus negative)
of two columns of numbers in his output. This led to a reversed prediction of
handedness in the ABC transporter gene MsbA. This error had implications for
the work of many other scientists in addition to his own. The story is an ob-
ject lesson in the potential consequences of failure to attend to basic software
testing and quality assurance issues, although his principled response to the sit-
uation suggests that in his case, those consequences will be limited to retracted
publications and will not be career-ending (see [5] for the retractions). For the
sorts of standard software testing techniques that we looked for in the work re-
ported here, a considerable amount of good material is available, ranging from
cookbook-like how-to manuals (e.g. [13]) to theoretical work [3,14,4]. Language
processing presents a number of specific testing issues related to unique char-
acteristics of the input data, and the literature on it is quite limited (but see
[6,12,7] for some attempts to address this topic in the biomedical natural lan-
guage processing domain, specifically). No non-trivial application is ever likely
to be completely free of bugs, but that does not free us of the obligation to test
for them. As we have shown here, approaches to doing so that are inspired by
linguistic techniques are effective at granular characterization of performance,
finding bugs, and achieving high code coverage.
Search WWH ::




Custom Search