Databases Reference
In-Depth Information
(i.e. recall and precision) of these approaches on larger collections so that inter-
ested readers are referred to their publication for further details. The estimates they
published are as follows:
1. Information retrieval methods (Recall: high/Precision: medium)
2. Descriptive methods (Recall: high/Precision: high)
3. Denotational semantics methods (Recall: high/Precision: very high)
4. Operational semantics methods (Recall: high/Precision: very high)
5. Structural methods (Recall: very high/Precision: very high)
6. Topological methods (Recall: unknown 1 /Precision: unknown)
Software retrieval is a specialization of information retrieval and hence it makes
sense to reuse methods from the latter area to perform a simple, purely text-based
retrieval of software assets. Descriptive methods go a small step further and rely on
external textual descriptions (i.e. metadata) for an asset. Hence, Mili et al. denote
such descriptive methods as a subset of the information retrieval methods, but due
to the high use of this approach in practice and literature they created an additional
category. Denotational semantics methods use signatures (see e.g. [ 42 ]) or formal
specifications [ 43 ] of the indexed assets for retrieval. While signature matching is
widely seen as a practical tool in this context, as it uses the parameters and re-
turn values exhibited in the interface of an artifact for matching, software retrieval
based upon the matching of formal specifications suffers from a variety of disadvan-
tages (such as difficulties in creating and evaluating them). Operational semantics
approaches that rely on the execution of the indexed software with sample input val-
ues are certainly expensive to execute, however, they seem to be easily automatable.
Nevertheless, also appealing in theory, this approach definitively also comes with
some practical challenges: side effects, non-termination, the structure of used data
types, dependencies, etc. can cause serious problems. Hence, in this context, it is
no surprise that the most well-known implementation so far, called Behavior Sam-
pling [ 30 ], was merely applied to simple mathematical functions of the C standard
library. Structural methods finally do not deal with the code of the assets directly,
but rather with internal program patterns or designs. Since it is largely unclear how
to formulate queries for such an approach, it does not surprise that it has only rarely
been experimented with.
Overlap between the discussed classifications can appear at various places, e.g.
between (3) and (4) and (5) as the “sampling” of components typically needs a
specific signature or structure to work with. As visible in the list, Mili et al. still
defined topological methods as an independent class of approaches, however, since
their common denominator is the distance between the query and the candidates, we
would prefer to describe it as an approach for ranking search results that can (exclu-
sively) be used together with at least one concrete instance of the other approaches.
1 For topological methods it is difficult to define or estimate recall and precision. See [ 26 ]formore.
Search WWH ::




Custom Search