Information Technology Reference
In-Depth Information
2) exploiting all available information, in terms of existing full or partial ar-
chitecture documentation, in order to improve performance of predictive algo-
rithms. The few existing fully documented software systems can be used as gold
standards representing how a correct architecture recovery should appear. The
problem can be framed in terms of supervised clustering [15]: gold standards
are examples of inputs (the code) and desired outputs (its architectural orga-
nization), used to train a predictive machine trying to approximate the desired
output when fed with the code. In doing so, the predictor adapts the similarity
measure to improve the approximation. When presented with a new piece of
code, the trained machine clusters it using the learned similarity measure. We
plan to extend this supervised clustering paradigm, mostly developed for flat
clustering, to produce a hierarchy of clusters. Partial architecture documentation
can also be used in a similar fashion by turning the supervised learning problem
into a semi-supervised one: the algorithm is trained to output a full architec-
tural representation which is consistent with the partial information available,
possibly accounting for inconsistencies due to labeling errors or ambiguity.
8 Conclusions
Software Maintenance is a key phase of the Software development lifecycle, and
consequently many research efforts are devoted to provide new solutions to im-
prove its effectiveness. In this paper we dealt with the problem of developing
automated approaches for addressing two typical Software Maintenance tasks,
namely Software Architecture Recovery and Clone Detection. In particular we
focused on Kernel methods, using them as a powerful and flexible tool for mea-
suring “similarity” between code fragments, a main ingredient in clustering al-
gorithms which are widely used in SAR and clone detection approaches. In par-
ticular, we presented promising results in clone detection using Tree Kernels
over modified ASTs, together with a new method for the generation of labeled
training sets. As for SAR, we discussed how to adapt our structured kernels to
the problem at the hand, suggesting a number of directions to leverage the full
power of structured-output machine learning techniques.
References
1. Anquetil, N., Fourrier, C., Lethbridge, T.C.: Experiments with clustering as a
software remodularization method. In: Proceedings of the 6th Working Conference
on Reverse Engineering, pp. 235-255. IEEE Computer Society, Washington, DC
(1999)
2. Baker, B.: On finding duplication and near-duplication in large software systems.
In: IEEE Proceedings of the Working Conference on Reverse Engineering (1995)
3. Baxter, I.D., Yahin, A., Moura, L., Sant'Anna, M., Bier, L.: Clone detection using
abstract syntax trees. In: Proceedings of the International Conference on Software
Maintenance, pp. 368-377. IEEE Press (1998)
 
Search WWH ::




Custom Search