Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

To better provide a detailed overview of different approaches, in the following

we present the related literature with respect to the information exploited in the

clustering process, namely structural information, lexical information, and their

combinations.

Structural Based Approaches: The works proposed by Wiggerts [46] and

by Anquetil and Lethbridge [1] represent the first two contributions to semi-

automatic approaches for the clustering of software entities. In particular, in [1]

authors present a comparative study of different hierarchical clustering algo-

rithms based on structural information. However the proposed solutions require

human decisions (e.g., cutting points of the dendrograms) to get the best parti-

tion of software entities into clusters.

Maqbool and Babri in [34] highlight the features of hierarchical clustering

research in the context of software architecture recovery. Special emphasis is

posed on the analysis of different similarity and distance measures that could

be effectively used in clustering software artifacts. The main contribution of the

paper is, however, the analysis of two clustering based approaches and their

experimental assessment. The discussed approaches try to reduce the number of

decisions to be taken during the clustering. They also conducted an empirical

evaluation of the clustering based approaches on four large software systems.

Mitchell and Mancoridis in [36] present a novel clustering algorithm, named

Bunch . Bunch produces system decompositions applying search based techniques

in combination with several heuristics, such as the coupling and cohesion of pro-

duced partitions, specifically designed for the clustering of software artifacts. In

particular, the coupling and the cohesion heuristics are defined in terms of intra-

e inter- clusters dependencies respectively. The evaluation of the produced par-

titions has been conducted according to qualitative and quantitative empirical

investigations. Similarly, Dove et al. [12] propose a structural approach based on

genetic algorithms to group software entities in clusters.

Clustering algorithms based on structural information have also been used

in the analysis of the software architecture evolution [5], [47]. Wu et al. in [47]

present a comparative study of a number of clustering algorithms: (a) hierar-

chical agglomerative clustering algorithms based on the Jaccard coecient and

the single/complete linkage update rules; (b) an algorithm based on program

comprehension patterns that tries to recover subsystems that are commonly

found in manually-created decompositions of large software systems; and (c) a

customized configuration of an algorithm implemented in Bunch [36]. Similarly,

Bittencourt and Guerrero [5] present an empirical study to evaluate four widely

known clustering algorithms on a number of software systems implemented in

Java and C/C++. The analyzed algorithms are: Edge betweenness clustering, k-

means clustering, modularization quality clustering, and design structure matrix

clustering.

Lexical Based Approaches: Software clustering approaches exploiting lexical

information are based on the idea that the lexicon provided by developers in the

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home