Assessment of Software Testing and Quality Assurance in Natural Language Processing Applications and a Linguistically Inspired Approach to Improving It - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

that in fact even a small test suite can achieve much better code coverage than

a very large corpus.

As a reviewer pointed out, most linguistic phenomena are Zipfian in nature.

How far must we go in evaluating and handling the phenomena in the Zipfian

tail? Steedman has an insightful observation on this question:

We have come to believe that the linguists have forgotten Zipf's law,

which says that most of the variance in linguistic behavior can be cap-

tured by a small part of the system.

The linguists, on the other hand, think that it is we who have forgot-

ten Zipf's law, which also says that most of the information about the

language system as a whole is in the Long Tail.

It is we who are at fault here, because the machine learning techniques

that we rely on are actually very bad at inducing systems for which the

crucial information is in rare events.. .

One day. . . the Long Tail will come back to haunt us.

[21]

Even for work whose goal is not application-building but basic research, the

costs of failing to attend to basic software testing and quality assurance issues

can be quite severe. As Rob Knight has put it, “For scientific work, bugs don't

just mean unhappy users who you'll never actually meet: they mean retracted

publications and ended careers. It is critical that your code be fully tested before

you draw conclusions from the results it produces.” The recent case of Geoffrey

Chang (see [16] for a succinct discussion) is illustrative. In 2006, he was a star

of the protein crystallography world. That year he discovered a simple software

error in his code which led to a reversal of the sign (positive versus negative)

of two columns of numbers in his output. This led to a reversed prediction of

handedness in the ABC transporter gene MsbA. This error had implications for

the work of many other scientists in addition to his own. The story is an ob-

ject lesson in the potential consequences of failure to attend to basic software

testing and quality assurance issues, although his principled response to the sit-

uation suggests that in his case, those consequences will be limited to retracted

publications and will not be career-ending (see [5] for the retractions). For the

sorts of standard software testing techniques that we looked for in the work re-

ported here, a considerable amount of good material is available, ranging from

cookbook-like how-to manuals (e.g. [13]) to theoretical work [3,14,4]. Language

processing presents a number of specific testing issues related to unique char-

acteristics of the input data, and the literature on it is quite limited (but see

[6,12,7] for some attempts to address this topic in the biomedical natural lan-

guage processing domain, specifically). No non-trivial application is ever likely

to be completely free of bugs, but that does not free us of the obligation to test

for them. As we have shown here, approaches to doing so that are inspired by

linguistic techniques are effective at granular characterization of performance,

finding bugs, and achieving high code coverage.

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home