Is Branch Coverage a Good Measure of Testing Effectiveness? - Empirical Software Engineering and Verification

Information Technology Reference

In-Depth Information

For most of the classes, the median does not reach

1

. This indicates different runs

detect different faults (since median

1

would mean that every run finds the same faults).

3.4

Similarity of Faults

As in the case of the branch coverage level, we are interested in the similarity of detected

faults for the same class among test runs. The detected faults are similar when different

test runs find the same faults. Definitions of distances, similarity and fault detection

vector, similar to those of section 3.2, are appropriate.

The fault detection vector of a class in a particular test run is a vector of

n

elements,

n

with

being the total number of faults detected for that class over all runs. Because we

do not know the actual number of faults in a class, we can only use the total number of

faults found by AutoTest. Each vector element is 1 if the corresponding fault has been

detected and 0 otherwise.

Given two fault detection vectors

r

and

s

for the same class, in which the total num-

ber of found faults is

N f ,the fault detection distance

D f

between

r

and

s

is defined

as

N f

D f =

r i ⊕ s i

i =1

where

r i

and

s i

is the value at the

i

-th position of

r

and

s

respectively.

D f

is in the

range between 0..

N f .

The fault detection similarity between them is then defined as:

N f − D f

N f

The fault detection similarity ranges from

. The larger the similarity, the more

faults are detected in both test runs or in neither. Fault detection similarity among more

than two vectors is calculated similarly to branch coverage similarity.

Figure 6 shows the similarity of detected faults in different test runs for each class.

The median of the fault detection similarity for all classes (the thick curve) ranges from

0 . 84

0

to

1

0 . 90

. The figure indicates that most of the faults can be detected in every test run,

but (because the median does not reach 1.0 ) in order to get as many faults as possible,

multiple test runs for that class are necessary. Figure 7 shows the standard deviation of

the fault detection similarity for each class. The median (the thick curve) ranges from

0 . 07

to

of the median for all classes.

This implies that most faults are discovered by most testing runs, but several runs

produce better results. The choice of seed has a stronger impact on fault detection than

on branch coverage.

to

0 . 05

, corresponding to

8%

to

5%

3.5

Correlation between Branch Coverage and Number of Faults

Here we take a closer look at the correlation between branch coverage and the number

of detected faults. Although higher coverage does uncover more faults overall, it is

clearly not sufficient an indicator.

Empirical Software Engineering and Verification

Search WWH ::

Custom Search

Home