Databases Reference
In-Depth Information
Program comprehension tools can be categorized as (i) extraction tools such as
parsers (Rigi) [ 8 ], (ii) analysis tools for clustering, feature identification and slicing
(Bauhaus tool [ 2 ]), and (iii) as presentation tools such as code editors, browsers and
visualizations [ 16 , 20 ].
Lethbridge et al. [ 13 ] in their study on grep discuss that searching within source
code is used for locating the bug/problem, finding ways to fix it and then evaluating
the impact on other segments. Sim et al. [ 12 ] found that programmers were most
frequently looking for function definitions, variable definitions, all uses of a function
and all uses of a variable.
However, none of these existing tools have capabilities to search for software
components based on functionality and purpose - which is the basic idea behind
Internet-scale source code searching.
3.2.4 Software Reuse
It is evident from the discussion so far that source code searching on the Internet has
more commonalities with the phenomenon of software reuse, than with traditional
source code searching for program understanding and bug fixing.
Reuse is a common motivation for Internet-scale source code searching [ 15 ].
Programmers do not want to “re-invent the wheel,” especially when the open source
world allows reuse to occur at all levels of granularity, starting from a few lines of
code, to an entire library; from a tool to an entire system.
Reuse in proprietary settings involved indexing and storing software components
in a way that would make retrieval and usage easy (for example, the structured clas-
sification technique, by Prieto-Diaz [ 10 ]). Complex queries had to be formed to
retrieve such components and the process of translating requirements into search
terms posed a cognitive burden for software engineers. Fischer et al. [ 4 ] also dis-
cuss the gap between the system model of the software and the user's situation
model, which makes it difficult for the user to express a requirement in a language
that the system can understand. They also discuss the technique of retrieval by
reformulation - a continuous refinement process that forms cues for retrieval of
components that are not well-defined initially.
The problem of discourse persists through the open source era as the primary
method of searching continues to be keywords and regular expressions. Support
provided for locating and comprehending software objects does not scale up to the
actual potential for reuse even in open source projects.
Reuse of open source code occurs with an understanding that effort will be ex-
pended in contextualizing, comprehending and modifying a piece of software -
while traditionally, the reuse concept assumed little or no modification of compo-
nents. Another interesting difference is that in open source the options available for
a given search query are tremendous as opposed to a company-wide repository of
source code, which may or may not have relevant reusable code.
Search WWH ::




Custom Search