Information Technology Reference
In-Depth Information
Tabl e 1. Summary of behaviors. 7 of 20 sites returned the “bad” or “worst” type of
results.
Response type
Sites
Good (prompt or input screen displayed)
13
Bad (invalid-appearing or false 0 returned)
6
Worst (valid-appearing data returned)
1
F-measure, there has been much less work on granular evaluation of the perfor-
mance of such applications. (In the remainder of the paper, there is a deliberate
blurring or mixing of metaphors between what Palmer and Finin have called
glass-box evaluation (which I refer to as granular evaluation), or fine-grained
evaluation of specific linguistic features [18]; and finding errors in performance,
or bugs. As will be seen, it is fruitful to blur this distinction.) There has been
correspondingly little research on methods for doing so. We describe here a
methodology for granular evaluation of the performance of natural language
processing applications using techniques from traditional software testing and
from linguistics. Software testing conventionally makes use of test suites. A test
suite is a set of test inputs with known desired outputs that is structured so as
to explore the feature space of a specified type of input. Test cases are built by
determining the set of features that a type of input might have and the contexts
in which those features might be found. For a simple example, a function that
takes numbers as inputs might be tested with a test suite that includes integers,
real numbers, positive numbers, negative numbers, and zero. Good testing also
includes a suite of “dirty” or unexpected inputs—for example, the function that
takes numbers as inputs might be passed a null variable, a non-null but empty
variable, and letters.
There is a theoretical background for test suite construction. It turns out to
be overwhelmingly similar to the formal foundations of linguistics. Indeed, if
one examines the table of contents of a book on the theory of software testing
(see several listed below) and Partee et al.'s textbook on the formal founda-
tions of linguistics [19], one finds very similar chapters. The table of contents of
[19] includes chapters on basic concepts of set theory, relations and functions,
properties of relations, basic concepts of logic and formal systems, statement
logic, predicate logic, finite automata, formal languages, and Type 3 grammars.
Similarly, if we look at the contents of a good book on software testing, we
see coverage of set theory [2], graphs and relations [3], logic [2], and automata
[2,3,14].
The theoretical similarities between software testing and linguistics turn out
to translate into practical methodologies, as well. In particular, the techniques of
software testing have much in common with the techniques of descriptive or field
linguistics—the specialty of determining the structures and functioning of an un-
known language. In the case of software testing, an application is approached
by determining the features of inputs and combinations of inputs (both “clean”
Search WWH ::




Custom Search