Databases Reference
In-Depth Information
able over the Web. Two important factors contributed to Sourcerer's success. First,
a principle of leveraging structural information in source code to build effective
search applications. This principle guided its design and implementation. Second, a
loosely coupled architecture that made it possible for selective use of smaller set of
elements across applications.
While SCSE, CodeGenie, and SAS represent three state-of-the art research pro-
totypes for code search, Sourcerer does not address needs to develop every code
search application that developers would need. For example, Sourcerer does not
provide support for information related to evolution and code changes, and there-
fore does not support search requirements around the problems related to evolution.
Also being focused solely on Java as the language of choice, Sourcerer does not
provide support to search in other languages. Sourcerer does not do any form of de-
duplication of source code while maintaining the repository for the three code search
applications. These could be some possible future improvements for Sourcerer and
next generation code search infrastructures.
Sourcerer's contents as well as its implementation are freely available for oth-
ers to use. The content is released as a citable dataset [ 21 ]. The implementa-
tion is available as an open source project in Github [ 36 ]. These efforts have
enabled external researchers to use Sourcerer's content and services in their re-
search [ 22 , 24 , 27 , 30 , 33 ].
8.9 Further Reading
Descriptions of earlier versions of Sourcerer are available in [ 2 ]and[ 20 ]. SCSE was
first described in [ 1 ]. Code specific heuristics used in SCSE and their formal evalua-
tionisdiscussedin[ 20 ]and[ 6 ]. Further details on CodeGenie is available in earlier
publications [ 17 , 18 ]. For details on user experiments and effectiveness evaluation of
CodeGenie consult [ 6 ]. For detailed discussion on implementation and evaluation of
SSI refer to [ 3 ]. More details on SAS is given in [ 4 ]. A definitive resource on details
of the Sourcerer infrastructure, in particular the research contribution it made along
with all three code search applications presented earlier (SCSE, CodeGenie, and
SAS) is the author's doctoral dissertation [ 6 ]. A revised version of Chap. 3 from [ 6 ]
appears in [ 5 ]. The chapter by Ossher and Lopes in this topic provides the most re-
cent and detailed discussion on dependency slicing that is one of the core services
available in the Sourcerer infrastructure. The Software Engineering research com-
munity has produced a large body of work related to code search. A detailed review
of some of these closely related to Sourcerer is available in [ 6 ] (Chap. 1). Next we
summarize some of the work that focused on building code search application on
top of a large-scale repository.
Merobase [ 14 ] is an infrastructure similar to Sourcerer. Like Sourcerer, Merobase
has built a large code repository, a code/component search engine and a Test-Driven
Search application using its repository. Merobase offers syntax aware code search,
and covers additional languages (C++ and ADA). There is no documented evidence
that Merobase includes structural ranking such as Sourcerer Code Search Engine's
Search WWH ::




Custom Search