Assessment of Software Testing and Quality Assurance in Natural Language Processing Applications and a Linguistically Inspired Approach to Improving It - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

code in a wide variety of types of software, and a Food and Drug Administra-

tion analysis of 3,140 medical device recalls in the 1990s concluded that 7.7%

of them (242/3,140) were due to software errors [8] (p. 7). Given the stated

intent to provide “mission-critical” tools to doctors and researchers, one might

expect due diligence with regard to the quality of software artifacts to be a

commonplace in the biomedical natural language processing community and an

established subfield of its research milieu. Surprisingly, that is not the case: on

the widest imaginable definition of quality assurance, there are fewer than a

dozen published studies on quality assurance for biomedical natural language

processing software, despite the high (and rapidly growing) level of activity in

the biomedical natural language processing area reported in [24] and reviewed

in work such as [25]. Given the apparently urgent need for biomedical natural

language processing tools that many papers claim in an introductory paragraph

citing the latest count of papers in PubMed/MEDLINE, it seems plausible that

although researchers in the area are exercising due diligence with respect to the

artifacts that they produce, they simply are not taking the time to do research

on quality assurance per se. We assayed the extent to which this might be the

case, and report the results here.

3 Methods and Results for Assessing Natural Language

Processing Applications with Respect to Software

Testing and Quality Assurance

Our methodology was simple. We examined 20 web sites that either provide

some form of text mining service (e.g. gene name identification or protein-protein

interaction extraction) or provide access to the output of text mining (e.g. a text-

mining-produced database). On each web site, we tried the most basic software

test imaginable. This test, which our experience suggests is probably the first

action performed by a typical professional software tester presented with any

new application to test, consists of passing the application an empty input. For

many web sites, the test consisted of simply hitting the “Submit” button or its

equivalent. For some web sites, this was first preceded by clearing sample input

from a text box. This is indeed the simplest and most basic software test of which

we are aware. We make the following (undoubtedly simplified) assumption: if the

system builders paid any attention to software testing and quality assurance at

all, they will have run this test; evidence that they tried the test will be that the

system responds to a blank input by prompting the user to populate the empty

field.

What constitutes an appropriate response to an empty input? We propose that

the best response to an empty input where a non-empty input was expected is

to give the user helpful feedack—to prompt the user to provide an input. For a

GUI-based application, the next-best response is probably Google's strategy—to

do nothing, and present the user with the exact same input screen. (For an API,

the second-best response may well be to throw an uncaught exception—this has

Search WWH ::

Custom Search

Home