Information Technology Reference
In-Depth Information
Tabl e 3. Application- and package-level coverage statistics using the test suite, the
full corpus with the full set of rules, and the full corpus with two reduced sets of rules.
The highest value in a row is bolded. The last three columns are intentionally identical
[7].
Metric
Functional tests Corpus, all rules nominal rules verbal rules
Overall line coverage
56%
41%
41%
41%
Overall branch coverage
41%
28%
28%
28%
Parser line coverage
55%
41%
41%
41%
Parser branch coverage
57%
29%
29%
29%
Rules line coverage
63%
42%
42%
42%
Rules branch coverage
71%
24%
24%
24%
Parser class coverage
88% (22/25)
80% (20/25)
Rules class coverage
100% (20/20)
90% (18/20)
coverage—sometimes much higher coverage, as in the case of branch coverage
for the rules components, where the corpus achieved 24% code coverage and the
test suite achieved 71% code coverage. The last three columns show the results
of an experiment in which we varied the size of the rule set. As can be seen
from the fact that the coverage for the entire rule set, a partition of the rule
set that only covered nominals, and a partition of the rule set that covered only
verbs, are all equal, the number of rules processed was not a determiner of code
coverage.
In a further experiment, we examined how code coverage is affected by vari-
ations in the size of the corpus. We monitored coverage as increasingly larger
portions of the the corpus were processed. The results for line coverage are shown
in Figure 1. (The results for branch coverage are very similar and are not shown.)
The x axis shows the number of sentences processed. The thick solid line indi-
cates line coverage for the entire application. The thin solid line indicates line
coverage for the rules package. The broken line and the right y axis indicate the
number of pattern matches.
As the figure shows quite clearly, increasing the size of the corpus does not lead
to increasing code coverage. It is 39% when a single sentence has been processed,
40% when 51 sentences have been processed, and 41%—the highest value that
it will reach—when 1,000 sentences have been processed. The coverage after
processing 191,478 sentences—the entire corpus of almost 4,000,000 words—is no
higher than it was at 1,000 sentences, and is barely higher than after processing
a single sentence.
Thus, we see that the “naturally occurring data assumption” does not hold—
from an engineering perspective, there is a clear advantage to using structured
test suites.
This should not be taken as a claim that running an application against a
large corpus is bad. In fact, we routinely do this, and have found bugs that were
not uncovered in other ways. However, testing with a structured test suite should
remain a primary element of natural language processing software testing.
 
Search WWH ::




Custom Search