Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge - page 129

Information Technology Reference

In-Depth Information

Precision has been used to measure the correctness of the results, while the

completeness has been assessed by employing the recall measure. More precisely,

Precision (P) and Recall (R) are given by: ones:

P = #actual identified clones

R = #actual identified clones

#total actual clones

#total candidates clones ;

.

To assess whether the approach is effective (RQ3), we computed a version

of the F-measure where Precision and Recall have the same weight, namely

F 1 =2

P∗R

∗

P + R .

6 Results and Threats to Validity

In this section we discuss the results we gathered by the application of the

approach on the different clone types, using different similarity thresholds for

the detection. First the three research questions are addressed, then a discussion

on how we handled the main threats to validity is presented.

Tabl e 3. Summary statistics of the results

Clone Type Threshold Precision Recall F 1

Type 1

N.A.

1.0

1.0 1.0

Type 2

0.7

0.6

0.9 0.7

Type 2

0.8

0.7

0.6 0.6

Type 3

0.7

0.6

0.8 0.7

Type 3

0.8

0.6

0.8 0.7

6.1 Correctness, Completeness and Effectiveness of the Results

Since the Tree Kernel based approach does not include any formatting detail in

its internal source code representation, Type 1 clones include no variability, and

thus no Similarity threshold is necessary. With this kind of clones, it is easy to

obtain 1.0 as F-Measure.

Regarding the other two types of clones, some modifications in the identifiers

(Type 2 and 3) and in statements (Type 3 only) have been performed. In these

cases, larger values of the threshold (e.g. 0 . 9) produce a small number of candi-

dates. As a consequence, the recall is low, since only code fragments which are

very similar are considered as clones. This effect is particularly evident for Type

3 clones, where no clones at all are detected. On the other hand, threshold values

like 0 . 7and0 . 8 lead to better performance. In particular, the value 0 . 7 seems to

improve completeness without affecting correctness, and is therefore preferable.

Such attained results are strongly comparable with those reported in [9] in terms

of all the three indicators we are considering, namely correctness, completeness

and effectiveness, thus confirming the validity of artificially generated data.

Next Page

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home