Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

2) exploiting all available information, in terms of existing full or partial ar-

chitecture documentation, in order to improve performance of predictive algo-

rithms. The few existing fully documented software systems can be used as gold

standards representing how a correct architecture recovery should appear. The

problem can be framed in terms of supervised clustering [15]: gold standards

are examples of inputs (the code) and desired outputs (its architectural orga-

nization), used to train a predictive machine trying to approximate the desired

output when fed with the code. In doing so, the predictor adapts the similarity

measure to improve the approximation. When presented with a new piece of

code, the trained machine clusters it using the learned similarity measure. We

plan to extend this supervised clustering paradigm, mostly developed for flat

clustering, to produce a hierarchy of clusters. Partial architecture documentation

can also be used in a similar fashion by turning the supervised learning problem

into a semi-supervised one: the algorithm is trained to output a full architec-

tural representation which is consistent with the partial information available,

possibly accounting for inconsistencies due to labeling errors or ambiguity.

8 Conclusions

Software Maintenance is a key phase of the Software development lifecycle, and

consequently many research efforts are devoted to provide new solutions to im-

prove its effectiveness. In this paper we dealt with the problem of developing

automated approaches for addressing two typical Software Maintenance tasks,

namely Software Architecture Recovery and Clone Detection. In particular we

focused on Kernel methods, using them as a powerful and flexible tool for mea-

suring “similarity” between code fragments, a main ingredient in clustering al-

gorithms which are widely used in SAR and clone detection approaches. In par-

ticular, we presented promising results in clone detection using Tree Kernels

over modified ASTs, together with a new method for the generation of labeled

training sets. As for SAR, we discussed how to adapt our structured kernels to

the problem at the hand, suggesting a number of directions to leverage the full

power of structured-output machine learning techniques.

References

1. Anquetil, N., Fourrier, C., Lethbridge, T.C.: Experiments with clustering as a

software remodularization method. In: Proceedings of the 6th Working Conference

on Reverse Engineering, pp. 235-255. IEEE Computer Society, Washington, DC

(1999)

2. Baker, B.: On finding duplication and near-duplication in large software systems.

In: IEEE Proceedings of the Working Conference on Reverse Engineering (1995)

3. Baxter, I.D., Yahin, A., Moura, L., Sant'Anna, M., Bier, L.: Clone detection using

abstract syntax trees. In: Proceedings of the International Conference on Software

Maintenance, pp. 368-377. IEEE Press (1998)

Search WWH ::

Custom Search

Home