Information Technology Reference
In-Depth Information
6.2 Threats to Validity
Wefocusedourattentioninthissectiononthe construct validity and the exter-
nal validity . Construct validity threats concern the relationship between theory
and observation. Precision, Recall, and F 1 well reflect the performance of the
proposed approach. However, the used data set has been obtained by manually
removing source clones and then introducing new clones of Types 1, 2 and 3 in a
controlled way. The performed mutations may bias the results since they could
affect the values of these measures. However, the defined mutation approach has
been conceived to reduce this effect on the results as much as possible.
To increase our awareness on the achieved results we also plan to assess the
validity of the results using different measures to determine various aspects of
detection quality [4].
External validity threats regard the generalization of the results. An impor-
tant threat is related to the studied software system. In particular, the size and
the fact that the system was developed by a student may threaten the validity of
the results. Also, the fact that this system was implemented in Java may affect
the generalization of the results. To this aim, we plan to conduct case study repli-
cations on commercial software systems implemented in different programming
language. This will increase our awareness on the validity of applying Kernels
methods in the detection of software clones. Regarding the scalability, software
systems with different size and clone density will be studied in the future.
7 Future Work: Architecture Recovery
Recovering the architecture of a software system requires to group together por-
tions of code jointly performing a certain function and identifying the structural
organization of these functional modules. The problem can be naturally formal-
ized in terms of hierarchical clustering (see Section 2). Within such framework,
we aim at improving over existing approaches by leveraging over the following
aspects:
1) exploiting the rich structure characterizing software projects, in terms of hi-
erarchical structuring of the code and relationships given by e.g. function calls.
As already discussed for the clone detection problem (see Section 4), Kernel
Methods are a natural candidate for learning problems involving richly struc-
tured objects. The promising results in clone detection using kernels on AST and
PDG are encouraging, showing the potential of structured kernels in uncovering
similarities between fragments. The wider variability of code found within func-
tional modules requires an adaptation of kernels in order to effectively detect
them. The problem can be addressed by combining kernel redesign with ker-
nel learning approaches [19], where the similarity measure is not fully specified
a-priori, but is learned from examples as a combination of similarity patterns
(e.g. involving different types of lexical and structural information). Logic ker-
nels [30,16] are particularly promising in this context, as they allow to encode
arbitrary domain knowledge concerning relationships between code fragments
from which similarity measures are to be learned.
Search WWH ::




Custom Search