Databases Reference
In-Depth Information
Our earliest work on code retrieval on the web was an extension of our previous
work on code search within an IDE. In preparation to conduct an empirical study,
we conducted a literature search. We found relevant papers on this topic in many
disciplines. Although, we used software engineering and program comprehension
as a starting point, we were quickly led to software reuse, human-computer interac-
tion (HCI), information retrieval, and even further afield to areas such as consumer
behavior.
One observation that we made repeatedly, both during the literature search and
in our subsequent research, is that there were two kinds of code search on the web.
One kind would be recognized by the software reuse community and involved the
reuse of source code in components with little or no modification. However, these
components or projects were not necessarily reused as intended by their original
designers. The other kind would be more familiar to those in HCI, where source
code is used as raw material in a creative process. The title and organization of this
topic reflects this division.
In this chapter, we give an introduction to code retrieval on the web. In Sect. 1.2 ,
we give some context to the emergence of this area. Next, we explain the organiza-
tion of this topic and give an overview of the chapters.
1.2 Emergence of Code Search on the Web
Code search has long been a critical part of software development. A study of soft-
ware engineering work practices found that searching was the most common activity
for software engineers [ 35 ]. They were typically locating a bug or a problem, finding
ways to fix it and then evaluating the impact on other segments. Program compre-
hension, code reuse, and bug fixing were cited as the chief motivations for source
code searching in that study. A related study on source code searching found that the
search goals cited frequently by developers were code reuse, defect repair, program
understanding, feature addition, and impact analysis [ 28 ]. They found that program-
mers were most frequently looking for function definitions, variable definitions, all
uses of a function and all uses of a variable.
The recognition that search is powerful and useful has led to advances in code
search tools. Software developers have needed tools to search through source code
since the appearance of interactive programming environments. It started with sim-
ple keyword search and when regular expressions were added, it became possible
to specify patterns and context [ 37 ]. An important improvement was made when
search techniques started using program structure, such as identifiers of variables
and functions, directly in expressing search patterns [ 1 , 27 ].
Another approach to syntactic search involves processing the program and stor-
ing facts in a database file of entity-relations [ 6 , 23 ]. Alternatively, the code can
be parsed and transformed into other representations, such as data flow graphs or
control flow graphs, and searches can be performed on those structures [ 26 ]. While
some of these ideas have not been widely adopted, searches using regular expres-
sions and program structure are standard in modern IDEs.
Search WWH ::




Custom Search