Assessment of Software Testing and Quality Assurance in Natural Language Processing Applications and a Linguistically Inspired Approach to Improving It - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

progress has been made recently in addressing the scientific challenges of cre-

ating computer programs that can properly handle the complexities of human

language. However, the transition from a demonstration of scientific progress

to the production of tools on which a broader community can depend requires

that fundamental software engineering requirements be addressed. Software for

medical devices has the benefit of explicit quality assurance requirements per

Section 201(h) of the Federal Food, Drug, and Cosmetic Act; Title 21 of the

Code of Federal Regulations Part 820; and 61 Federal Register 52602 [8] (p.

7). However, unless it is embedded in a medical device, biomedical natural lan-

guage processing software is not currently subject to federal quality assurance

requirements.

This paper represents the first attempt to characterize the state of one portion

of the diverse world of computational bioscience software, specifically biomed-

ical natural language processing applications, with respect to software testing

and quality assurance. We assay a broad range of biomedical natural language

processing services that are made available via web sites for evidence of quality

assurance processes. Our findings suggest that at the current time, software test-

ing and quality assurance are lacking in the community that produces biomedi-

cal natural language processing tools. For the tool consumer, this finding should

come as a note of caution.

2 Approach to Assessing the State of Natural Language

Processing Applications with Respect to Software

Testing and Quality Assurance

We looked at twenty web sites offering a variety of text-mining-related services.

In the body of this work, we never identify them by name: following the tradition

in natural language processing, we do not want to punish people for making their

work freely available. Our purpose is not to point fingers—indeed, one of our own

services is every bit as lacking in most or all of the measures that we describe

below as any. Rather, our goal is to allow the community to make a realistic

assessment of the state of the art with respect to software testing and quality

assurance for biomedical natural language processing systems, with the hope of

stimulating a healthy change.

The claim to have produced a useful tool is a commonplace in the biomedi-

cal natural language processing literature. The explicitly stated motivation for

much work in the field is to assist in the understanding of disease or of life, not

to advance the state of computer science or of understanding of natural (i.e.,

human) language. (In this, the biomedical natural language processing commu-

nity differs from the mainstream NLP community, which at least in theory is

motivated by a desire to investigate hypotheses about NLP or about natural

language, not to produce tools.) Software is widely known to be characterized

by “bugs,” or undesired behaviors—[15] reviews a wide range of studies that

suggest an industry average of error rates of 1 to 25 bugs per thousand lines of

Search WWH ::

Custom Search

Home