Information Technology Reference
In-Depth Information
3 State-of-the-Art of Clone Detection Techniques
In this section we summarize research in the area of clone detection, grouping the
proposals according to the features they exploit to identify similarities among
software artifacts. Note that our goal here is not to provide an extensive analysis
of the clone detection approaches presented in the literature but to provide an
overview of most important techniques together with a general background on
the problem, necessary to introduce the proposal presented in Section 4. An
exhaustive survey of clone detection tools and techniques is provided in [40].
Tabl e 2. Overview of clone detection techniques
Approach
Used Infor m ation
Technique
Ducasse et al. [14]
Textual
String matching
Johnson [22]
Baker [2]
Pattern matching
Token
Kamiya et al. [23]
Sux-tree matching
Yang [48]
Dynamic Programming
Baxter et al. [3]
Tree Matching
Koschke et al. [27]
Syntactic
Sux-tree AST
Bulychev et al. [6]
Anti-unification (NLP)
Jiang et al. [21]
LSH
Komondoor and Horwitz [25]
PDG Slicing
Krinke [28]
Structural
PDG Heuristics
Gabel et al. [17]
PDG Slicing
Leitao [32]
Software metrics
Wahler et al. [45]
Frequent Item-sets
Corazza et al. [9]
Combined
Tree Kernels (ML)
Roy and Cordy [39]
Code Transformation
Textual Based Approaches: Ducasse et al. [14] propose a language-
independent approach to detect code clones, based on line-based string matching
and visual presentation of the cloned code. A different approach is presented by
Johnson [22] where the author applies a string matching technique based on fin-
gerprints to identify exact repetitions of text in the source code of large software
systems.
The main feature of these techniques relies in their eciency and scalability,
easily applicable to the analysis of large software systems. However, their de-
tection capabilities are very limited and only restricted to very similar textual
duplications (line by line). As a matter of fact these approaches are scarcely
usable in practice.
Token Based Approaches: Baker [2] suggests an approach to identify dupli-
cations and near-duplications (i.e., copies with slightly modifications) in large
software systems. The proposed approach finds source code copies that are sub-
stantially the same except for global substitutions. Similarly, Kamiya et al. [23]
 
Search WWH ::




Custom Search