Information Technology Reference
In-Depth Information
Hutchins et al. [13] also compared the effectiveness of the branch coverage criterion
and the all-uses criterion. They found that for both criteria, test sets achieving coverage
levels over 90% showed significantly better fault detection than randomly selected test
sets of the same size. This means that a lot of faults could be detected when the cov-
erage level approaches 100%. They also concluded that in terms of effectiveness, there
is no winner between branch coverage and all-uses criterion. Our results on the corre-
lation between the branch coverage level and the number of detected faults also shows
a similar pattern: many faults are detected at higher coverage levels, in our experiment,
however, the branch coverage level did not reach
, while in their study, manually
written test sets guaranteed total branch coverage. Also, in their study, programs under
test were seeded with faults, while in our experiment, programs were tested as they are.
Gupta et al. [9] compared the effectiveness (the ability to detect faults) and effi-
ciency (the average cost for detecting a fault) of three code coverage criteria: predicate
coverage, branch coverage and block coverage. They found that predicate coverage is
the most effective but the least efficient, block coverage is the least effective but most
efficient, while branch coverage is between predicate coverage and block coverage in
terms of both effectiveness and efficiency. Their results suggest that branch coverage
is the best among those three criteria for getting better results with moderate testing
efforts.
100%
7
Conclusions and Future Work
This article has shown that the branch coverage level achieved by random testing varies
depending on the structure of the program under test but was very high on the classes
we tested (
on average). Most of the branches exercised by random testing are
exercised very quickly (in the first
93%
minutes of testing) regardless of the class under
test. For the same class, branches exercised in different test runs are almost the same.
Different test runs on the same class detect roughly
10
different faults.
Our results also confirm that branch coverage in general is not a good indicator of
the quality of a test suite. In the experiments, more than
10%
50%
of the faults are uncovered
while coverage is at a plateau. Although many studies showed the weakness of branch
coverage, there is little evidence showing that random testing finds new faults while the
branch coverage stagnates.
Our results indicate that branch coverage is not a good stopping criterion for ran-
dom testing. One should test a program in multiple test runs to find as many faults as
possible even though by doing so the branch coverage level will not be increased in
general. Also, one should not stop random testing, even if the branch coverage level
stops increasing or only increases very slowly.
For the continuation of this work, we are investigating how to reach even higher
branch coverage (100% or very close), and how to devise a good stopping criterion for
random testing.
Acknowledgement. We thank Ilinca Ciupa, Andreas Leitner, Simon Poulding, and
Stephan van Staden for their insightful comments.
 
Search WWH ::




Custom Search