Is Branch Coverage a Good Measure of Testing Effectiveness? - Empirical Software Engineering and Verification

Information Technology Reference

In-Depth Information

decreases, although less dramatically. After

minutes, the branch coverage level only

increases slightly, but many faults are detected in that period.

We also calculated the correlation between branch coverage and normalized num-

ber of faults. It varies much from class to class,

30

and there seems to be no

common pattern among the tested classes as shown in Figure 9.

The implications of these results are twofold: (1) when coverage increases, faults

discovered increase as well, (2) when coverage stagnates, faults are still found. Thus in-

creasing the branch coverage clearly increases the number of faults found. It is however

clearly not sufficient to have a high value of the branch coverage to assess the quality

of a testing session.

The next section further elaborates on these findings as well as their limitations.

0 . 3

to

0 . 97

4

Discussion

The results of the previous section provide material for answering three questions:

- Is branch coverage a good stopping criterion for random testing?

- Is it a good measure of testing effectiveness?

- What are the unexercised branches?

4.1 Branch Coverage as Stopping Criterion for Random Testing

Since in general, random testing cannot achieve

branch coverage in finite time,

total branch coverage is not a feasible stopping criterion. In practice, the percentage

of code coverage is often used as an adequacy criterion: the higher the percentage,

the more adequate the testing [19]; and testing can be stopped if the generated test

suite reached a certain level of adequacy. In our experiments, after

100%

hour, the branch

coverage level hardly increases, so it will be unpractical to extend the testing time until

reaching full coverage. Instead, the only reasonable way to use branch coverage would

be to evaluate the expectation of finding new faults. As shown in the previous section,

the number of faults evolves closely with the coverage only in the first few minutes of

testing. On testing sessions longer than

1

minutes, the correlation degrades. In fact,

about 50% of the faults are found in a period where the branch coverage level hardly

increases any more. This means that branch coverage is not a good predictor for the

number of faults remaining to be found.

The correlation greatly varies from class to class. For some classes such as BI-

NARY SEARCH TREE, the correlation coefficient is

10

and the correlation is al-

most linear, but for others such as ARRAYED STACK the correlation is weak (

0 . 98

),

especially with longer testing sessions. This variation on the class under test reduces

the precision if branch coverage is used as a stopping criterion.

Random testing also detects different faults in different test runs while it exercises

almost the same branches. This confirms that multiple restarts drastically improves the

number of faults found [5]: to find as many faults as possible, a class should be random-

tested multiple times with different seeds, even if the same branches are exercised every

time.

Our conclusion is that branch coverage alone cannot be used as a stopping criterion

for random testing.

0 . 3

Empirical Software Engineering and Verification

Search WWH ::

Custom Search

Home