Information Technology Reference
In-Depth Information
4.2
Branch Coverage as Measure of Testing Effectiveness
To assess branch coverage as a measure of testing effectiveness, one must understand
that running random testing longer is the same as adding new test cases into a test suite.
The reason is that testing for a longer time means that more routine calls are executed on
the class under test. Each routine call is actually the last line of a test case that contains
all previous calls participating to the state of data used in the call (see [14] for a detailed
explanation of test case construction and simplification). To push the analogy further,
testing a class in different runs is the same as providing different test suites for that
class.
Our experiments test production code in which the existing number of faults is un-
known. They do not seed faults in the code but merely tested the discrepancy between
the contracts and the code. As a result, it is not possible to use the ratio of detected
faults against the total number of faults to measure the effectiveness of testing. Instead,
we assess testing effectiveness through two parameters: the number of faults detected
and the speed at which those faults are detected.
Two results show that different faults can be detected at the same level of branch
coverage: (1) in a test run, new faults were detected in a period where branch cover-
age hardly changes; (2) in different test runs for the same class, different faults were
detected while almost the same branches were exercised. In other words, different test
suites satisfying the same branch coverage criterion may detect different faults.
These two observations indicate that test adequacy in terms of branch coverage level
is highly predictable, not only in how many branches are covered, but also in what
the covered branches are. Applying random testing to a class always yields the same
level of branch coverage adequacy. Also, for all the tested classes, the branch coverage
adequacy level stabilizes after some time (1 hour in our case).
Although we do not know how many faults remain in tested classes, it was aston-
ishing to discover that over 50% of found faults only appear in the period when branch
coverage stagnates.
These results provide evidence of the lack of reliability [8] of branch coverage crite-
rion achieved by random testing. Reliability requires that a test criterion always produce
consistent results. In the experiments reported here, this goal requires that two test runs
achieving the same branch coverage of a class should deliver similar numbers of faults.
But the results show that the number of faults found in different test runs will differ
from each other by at least
.
What about the speed of fault detection? In the first few minutes of random test-
ing, branch coverage increases quickly, and the number of faults increases accordingly,
with a strong correlation. This means that branch coverage is good in measuring test-
ing effectiveness in the first few minutes. But after a while, the branch coverage level
hardly increases, the fault detection speed also slows down but less dramatically than
the branch coverage level. In fact, many faults are detected in the period where the
branch coverage hardly changes. This means in the later period, branch coverage is not
a good measure for testing effectiveness.
In general, to detect as many faults as possible, branch coverage is necessary but not
sufficient.
50%
 
Search WWH ::




Custom Search