Is Branch Coverage a Good Measure of Testing Effectiveness? - Empirical Software Engineering and Verification

Information Technology Reference

In-Depth Information

Hutchins et al. [13] also compared the effectiveness of the branch coverage criterion

and the all-uses criterion. They found that for both criteria, test sets achieving coverage

levels over 90% showed significantly better fault detection than randomly selected test

sets of the same size. This means that a lot of faults could be detected when the cov-

erage level approaches 100%. They also concluded that in terms of effectiveness, there

is no winner between branch coverage and all-uses criterion. Our results on the corre-

lation between the branch coverage level and the number of detected faults also shows

a similar pattern: many faults are detected at higher coverage levels, in our experiment,

however, the branch coverage level did not reach

, while in their study, manually

written test sets guaranteed total branch coverage. Also, in their study, programs under

test were seeded with faults, while in our experiment, programs were tested as they are.

Gupta et al. [9] compared the effectiveness (the ability to detect faults) and effi-

ciency (the average cost for detecting a fault) of three code coverage criteria: predicate

coverage, branch coverage and block coverage. They found that predicate coverage is

the most effective but the least efficient, block coverage is the least effective but most

efficient, while branch coverage is between predicate coverage and block coverage in

terms of both effectiveness and efficiency. Their results suggest that branch coverage

is the best among those three criteria for getting better results with moderate testing

efforts.

100%

7

Conclusions and Future Work

This article has shown that the branch coverage level achieved by random testing varies

depending on the structure of the program under test but was very high on the classes

we tested (

on average). Most of the branches exercised by random testing are

exercised very quickly (in the first

93%

minutes of testing) regardless of the class under

test. For the same class, branches exercised in different test runs are almost the same.

Different test runs on the same class detect roughly

10

different faults.

Our results also confirm that branch coverage in general is not a good indicator of

the quality of a test suite. In the experiments, more than

10%

50%

of the faults are uncovered

while coverage is at a plateau. Although many studies showed the weakness of branch

coverage, there is little evidence showing that random testing finds new faults while the

branch coverage stagnates.

Our results indicate that branch coverage is not a good stopping criterion for ran-

dom testing. One should test a program in multiple test runs to find as many faults as

possible even though by doing so the branch coverage level will not be increased in

general. Also, one should not stop random testing, even if the branch coverage level

stops increasing or only increases very slowly.

For the continuation of this work, we are investigating how to reach even higher

branch coverage (100% or very close), and how to devise a good stopping criterion for

random testing.

Acknowledgement. We thank Ilinca Ciupa, Andreas Leitner, Simon Poulding, and

Stephan van Staden for their insightful comments.

Empirical Software Engineering and Verification

Search WWH ::

Custom Search

Home