Information Technology Reference
In-Depth Information
To better provide a detailed overview of different approaches, in the following
we present the related literature with respect to the information exploited in the
clustering process, namely structural information, lexical information, and their
combinations.
Structural Based Approaches: The works proposed by Wiggerts [46] and
by Anquetil and Lethbridge [1] represent the first two contributions to semi-
automatic approaches for the clustering of software entities. In particular, in [1]
authors present a comparative study of different hierarchical clustering algo-
rithms based on structural information. However the proposed solutions require
human decisions (e.g., cutting points of the dendrograms) to get the best parti-
tion of software entities into clusters.
Maqbool and Babri in [34] highlight the features of hierarchical clustering
research in the context of software architecture recovery. Special emphasis is
posed on the analysis of different similarity and distance measures that could
be effectively used in clustering software artifacts. The main contribution of the
paper is, however, the analysis of two clustering based approaches and their
experimental assessment. The discussed approaches try to reduce the number of
decisions to be taken during the clustering. They also conducted an empirical
evaluation of the clustering based approaches on four large software systems.
Mitchell and Mancoridis in [36] present a novel clustering algorithm, named
Bunch . Bunch produces system decompositions applying search based techniques
in combination with several heuristics, such as the coupling and cohesion of pro-
duced partitions, specifically designed for the clustering of software artifacts. In
particular, the coupling and the cohesion heuristics are defined in terms of intra-
e inter- clusters dependencies respectively. The evaluation of the produced par-
titions has been conducted according to qualitative and quantitative empirical
investigations. Similarly, Dove et al. [12] propose a structural approach based on
genetic algorithms to group software entities in clusters.
Clustering algorithms based on structural information have also been used
in the analysis of the software architecture evolution [5], [47]. Wu et al. in [47]
present a comparative study of a number of clustering algorithms: (a) hierar-
chical agglomerative clustering algorithms based on the Jaccard coecient and
the single/complete linkage update rules; (b) an algorithm based on program
comprehension patterns that tries to recover subsystems that are commonly
found in manually-created decompositions of large software systems; and (c) a
customized configuration of an algorithm implemented in Bunch [36]. Similarly,
Bittencourt and Guerrero [5] present an empirical study to evaluate four widely
known clustering algorithms on a number of software systems implemented in
Java and C/C++. The analyzed algorithms are: Edge betweenness clustering, k-
means clustering, modularization quality clustering, and design structure matrix
clustering.
Lexical Based Approaches: Software clustering approaches exploiting lexical
information are based on the idea that the lexicon provided by developers in the
 
Search WWH ::




Custom Search