Assessment of Software Testing and Quality Assurance in Natural Language Processing Applications and a Linguistically Inspired Approach to Improving It - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

Tabl e 1. Summary of behaviors. 7 of 20 sites returned the “bad” or “worst” type of

results.

Response type

Sites

Good (prompt or input screen displayed)

13

Bad (invalid-appearing or false 0 returned)

6

Worst (valid-appearing data returned)

1

F-measure, there has been much less work on granular evaluation of the perfor-

mance of such applications. (In the remainder of the paper, there is a deliberate

blurring or mixing of metaphors between what Palmer and Finin have called

glass-box evaluation (which I refer to as granular evaluation), or fine-grained

evaluation of specific linguistic features [18]; and finding errors in performance,

or bugs. As will be seen, it is fruitful to blur this distinction.) There has been

correspondingly little research on methods for doing so. We describe here a

methodology for granular evaluation of the performance of natural language

processing applications using techniques from traditional software testing and

from linguistics. Software testing conventionally makes use of test suites. A test

suite is a set of test inputs with known desired outputs that is structured so as

to explore the feature space of a specified type of input. Test cases are built by

determining the set of features that a type of input might have and the contexts

in which those features might be found. For a simple example, a function that

takes numbers as inputs might be tested with a test suite that includes integers,

real numbers, positive numbers, negative numbers, and zero. Good testing also

includes a suite of “dirty” or unexpected inputs—for example, the function that

takes numbers as inputs might be passed a null variable, a non-null but empty

variable, and letters.

There is a theoretical background for test suite construction. It turns out to

be overwhelmingly similar to the formal foundations of linguistics. Indeed, if

one examines the table of contents of a book on the theory of software testing

(see several listed below) and Partee et al.'s textbook on the formal founda-

tions of linguistics [19], one finds very similar chapters. The table of contents of

[19] includes chapters on basic concepts of set theory, relations and functions,

properties of relations, basic concepts of logic and formal systems, statement

logic, predicate logic, finite automata, formal languages, and Type 3 grammars.

Similarly, if we look at the contents of a good book on software testing, we

see coverage of set theory [2], graphs and relations [3], logic [2], and automata

[2,3,14].

The theoretical similarities between software testing and linguistics turn out

to translate into practical methodologies, as well. In particular, the techniques of

software testing have much in common with the techniques of descriptive or field

linguistics—the specialty of determining the structures and functioning of an un-

known language. In the case of software testing, an application is approached

by determining the features of inputs and combinations of inputs (both “clean”

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home