Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

6.2 Threats to Validity

Wefocusedourattentioninthissectiononthe construct validity and the exter-

nal validity . Construct validity threats concern the relationship between theory

and observation. Precision, Recall, and F 1 well reflect the performance of the

proposed approach. However, the used data set has been obtained by manually

removing source clones and then introducing new clones of Types 1, 2 and 3 in a

controlled way. The performed mutations may bias the results since they could

affect the values of these measures. However, the defined mutation approach has

been conceived to reduce this effect on the results as much as possible.

To increase our awareness on the achieved results we also plan to assess the

validity of the results using different measures to determine various aspects of

detection quality [4].

External validity threats regard the generalization of the results. An impor-

tant threat is related to the studied software system. In particular, the size and

the fact that the system was developed by a student may threaten the validity of

the results. Also, the fact that this system was implemented in Java may affect

the generalization of the results. To this aim, we plan to conduct case study repli-

cations on commercial software systems implemented in different programming

language. This will increase our awareness on the validity of applying Kernels

methods in the detection of software clones. Regarding the scalability, software

systems with different size and clone density will be studied in the future.

7 Future Work: Architecture Recovery

Recovering the architecture of a software system requires to group together por-

tions of code jointly performing a certain function and identifying the structural

organization of these functional modules. The problem can be naturally formal-

ized in terms of hierarchical clustering (see Section 2). Within such framework,

we aim at improving over existing approaches by leveraging over the following

aspects:

1) exploiting the rich structure characterizing software projects, in terms of hi-

erarchical structuring of the code and relationships given by e.g. function calls.

As already discussed for the clone detection problem (see Section 4), Kernel

Methods are a natural candidate for learning problems involving richly struc-

tured objects. The promising results in clone detection using kernels on AST and

PDG are encouraging, showing the potential of structured kernels in uncovering

similarities between fragments. The wider variability of code found within func-

tional modules requires an adaptation of kernels in order to effectively detect

them. The problem can be addressed by combining kernel redesign with ker-

nel learning approaches [19], where the similarity measure is not fully specified

a-priori, but is learned from examples as a combination of similarity patterns

(e.g. involving different types of lexical and structural information). Logic ker-

nels [30,16] are particularly promising in this context, as they allow to encode

arbitrary domain knowledge concerning relationships between code fragments

from which similarity measures are to be learned.

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home