Information Technology Reference
In-Depth Information
code in a wide variety of types of software, and a Food and Drug Administra-
tion analysis of 3,140 medical device recalls in the 1990s concluded that 7.7%
of them (242/3,140) were due to software errors [8] (p. 7). Given the stated
intent to provide “mission-critical” tools to doctors and researchers, one might
expect due diligence with regard to the quality of software artifacts to be a
commonplace in the biomedical natural language processing community and an
established subfield of its research milieu. Surprisingly, that is not the case: on
the widest imaginable definition of quality assurance, there are fewer than a
dozen published studies on quality assurance for biomedical natural language
processing software, despite the high (and rapidly growing) level of activity in
the biomedical natural language processing area reported in [24] and reviewed
in work such as [25]. Given the apparently urgent need for biomedical natural
language processing tools that many papers claim in an introductory paragraph
citing the latest count of papers in PubMed/MEDLINE, it seems plausible that
although researchers in the area are exercising due diligence with respect to the
artifacts that they produce, they simply are not taking the time to do research
on quality assurance per se. We assayed the extent to which this might be the
case, and report the results here.
3 Methods and Results for Assessing Natural Language
Processing Applications with Respect to Software
Testing and Quality Assurance
Our methodology was simple. We examined 20 web sites that either provide
some form of text mining service (e.g. gene name identification or protein-protein
interaction extraction) or provide access to the output of text mining (e.g. a text-
mining-produced database). On each web site, we tried the most basic software
test imaginable. This test, which our experience suggests is probably the first
action performed by a typical professional software tester presented with any
new application to test, consists of passing the application an empty input. For
many web sites, the test consisted of simply hitting the “Submit” button or its
equivalent. For some web sites, this was first preceded by clearing sample input
from a text box. This is indeed the simplest and most basic software test of which
we are aware. We make the following (undoubtedly simplified) assumption: if the
system builders paid any attention to software testing and quality assurance at
all, they will have run this test; evidence that they tried the test will be that the
system responds to a blank input by prompting the user to populate the empty
field.
What constitutes an appropriate response to an empty input? We propose that
the best response to an empty input where a non-empty input was expected is
to give the user helpful feedack—to prompt the user to provide an input. For a
GUI-based application, the next-best response is probably Google's strategy—to
do nothing, and present the user with the exact same input screen. (For an API,
the second-best response may well be to throw an uncaught exception—this has
 
Search WWH ::




Custom Search