Databases Reference
In-Depth Information
CodeRank, or advanced indexing techniques leveraging structural similarity such as
Sourcerer's SSI. Its Test-Driven Code Search application, Code Conjurer, provides a
feature to do background search not present in CodeGenie (Sourcerer's TDCS appli-
cation), but lacks automatic dependency slicing that allows declaratively complete
program slices to be merged into a developer workspace to create self-complete
code fragments satisfying the unit tests. Sourcerer also provides techniques to do
deep parsing of declaratively incomplete code found in repositories; this makes
Sourcerer resilient and superior in terms of extracting and leveraging structural in-
formation from source code collected from the 'wild'. The chapter by Hummel and
Janjic in this volume provides an in-depth discussion of CodeConjurer.
Maracatu [ 9 , 10 ] is another infrastructure built for code search. Similar to
Sourcerer, it is limited to searching Java source code. The authors of Maracatu
present useful requirements such as index update and optimization, but it is not clear
whether Maracatu implements all of such requirements. Sourcerer does not have a
proper mechanism to update its index to deal with changes in code repositories.
Maracatu also supports faceted search, where the facets are platform, component
type and component model. Sourcerer's index model (being based on Lucene) sup-
ports faceting out-of-the box on any metadata present in its index. However, the only
faceting that has been implemented in an end-user search application is in Sourcerer
API Search, where the top API elements can be used as facets to filter the code
results.
S6 [ 29 ] is another Test-Driven Code Search application, that applies code trans-
formations to convert source code found via code search into workable solutions.
Parseweb [ 31 ], is another code search application that uses source and destination
object types as input query to retrieve code files from existing code search engines.
It applies program analysis on retrieved files to extract method sequences that work
as code samples to get destination object types from source types. Applications
such as S6 and Parseweb can easily benefit from code search infrastructure such as
Sourcerer.
Portfolio [ 26 ] is a code search application that incorporates structural informa-
tion in ranking and retrieval. One of its unique feature is to show the call graph
of functions involved in the search results. Portfolio provides search access to over
18,000 C/C++ projects and 13,000 Java projects. As reported in its web site, the Java
projects used in portfolio come from Sourcerer and Merobase repositories [ 33 ].
Although not a code search infrastructure, FLOSSmole [ 13 ] is another major un-
dertaking in building large collection of metadata about open source projects on the
Web. Currently, FLOSSmole reports a massive data collection of more than 500,000
open source projects in its web site [ 32 ]. For code search infrastructure builders, now
it is possible to leverage FLOSSmole's project metadata to build code repositories
instead of spending an effort in implementing custom spiders and crawlers for code.
Acknowledgements The author would like to thank Joel Ossher, Otavio Lemos, Trung Ngo, Huy
Hunh, Paul Rigor, and Erik Linsted for their contributions to the Sourcerer infrastructure. The
author would like to thank Cristina Lopes and Pierre Baldi for their advice and support in making
Sourcerer successful.
Search WWH ::




Custom Search