Databases Reference
In-Depth Information
283 MLOC, and Google Code Search with over 1 billion lines of code. These source
code search engines treat Internet-scale code searching in much the same manner as
code search within a single project in an integrated development environment. But,
there are other kinds of searches that can take place on the Internet and we need to
know more.
This study was conducted to characterize Internet-scale source code searching:
What do developers look for? How do they find what they are looking for? What
tools do they use? When do they decide to search? To this end, we conducted a
questionnaire-based survey of software developers contacted using availability sam-
pling over the Internet. The design of this study is based on previous surveys by
Eisenstadt [ 3 ], and Sim, Clarke, and Holt [ 12 ]. Using an online questionnaire, we
collected data from over 70 programmers who were solicited using Google Groups
and mailing lists.
Their responses and anecdotes were analyzed systematically to find common
themes, or archetypes. An archetype is a concept from literary theory. It serves to
unify recurring images across literary works with a similar structure. In the context
of source code searching, an archetype is a theory to unify and integrate typical or
recurring searches. As with literature, a set of them will be necessary to characterize
the range of searching anecdotes.
We found that there are two major search archetypes and one minor one. The
first archetype was searching for a piece of code that can be reused. For example, a
text search engine, or a graphical user interface (GUI) widget. The second archetype
was searching for reference information, that is, for examples of code to learn from.
In this archetype, developers are using the World Wide Web as a very large desk
reference. The minor archetype was searching for reports and repairs of bugs, i.e.
patches. The two major archetypes had search targets that varied in size, while the
minor one did not. The search targets could be small-grained, such as a block of
code, medium-grained, such as a package, or large-grained, such as an entire system.
The results reported in chapter are an extension of the work reported in an earlier
paper [ 21 ].
3.2 Related Work
The work in this paper has evolved from past research and current trends in software
development. The two trends that motivate this research are the increasing avail-
ability of source code on the Internet, and the emergence of tools for accessing the
source code. The source code available on the web comes from open source projects,
web sites that support communities of practice, and language-specific archives.
Collectively, these sites contain billions of lines of code in countless languages. As
is the case with web pages, it can be difficult to locate a particular resource. General-
purpose search engines, such as Google and Yahoo!, can be used, but they do not
take advantage of structural information in the code. To fill this need, code-specific
search engines have been created. These software tools leverage the technology and
Search WWH ::




Custom Search