Infrastructure for Building Code Search Applications for Developers - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

CodeRank, or advanced indexing techniques leveraging structural similarity such as

Sourcerer's SSI. Its Test-Driven Code Search application, Code Conjurer, provides a

feature to do background search not present in CodeGenie (Sourcerer's TDCS appli-

cation), but lacks automatic dependency slicing that allows declaratively complete

program slices to be merged into a developer workspace to create self-complete

code fragments satisfying the unit tests. Sourcerer also provides techniques to do

deep parsing of declaratively incomplete code found in repositories; this makes

Sourcerer resilient and superior in terms of extracting and leveraging structural in-

formation from source code collected from the 'wild'. The chapter by Hummel and

Janjic in this volume provides an in-depth discussion of CodeConjurer.

Maracatu [ 9 , 10 ] is another infrastructure built for code search. Similar to

Sourcerer, it is limited to searching Java source code. The authors of Maracatu

present useful requirements such as index update and optimization, but it is not clear

whether Maracatu implements all of such requirements. Sourcerer does not have a

proper mechanism to update its index to deal with changes in code repositories.

Maracatu also supports faceted search, where the facets are platform, component

type and component model. Sourcerer's index model (being based on Lucene) sup-

ports faceting out-of-the box on any metadata present in its index. However, the only

faceting that has been implemented in an end-user search application is in Sourcerer

API Search, where the top API elements can be used as facets to filter the code

results.

S6 [ 29 ] is another Test-Driven Code Search application, that applies code trans-

formations to convert source code found via code search into workable solutions.

Parseweb [ 31 ], is another code search application that uses source and destination

object types as input query to retrieve code files from existing code search engines.

It applies program analysis on retrieved files to extract method sequences that work

as code samples to get destination object types from source types. Applications

such as S6 and Parseweb can easily benefit from code search infrastructure such as

Sourcerer.

Portfolio [ 26 ] is a code search application that incorporates structural informa-

tion in ranking and retrieval. One of its unique feature is to show the call graph

of functions involved in the search results. Portfolio provides search access to over

18,000 C/C++ projects and 13,000 Java projects. As reported in its web site, the Java

projects used in portfolio come from Sourcerer and Merobase repositories [ 33 ].

Although not a code search infrastructure, FLOSSmole [ 13 ] is another major un-

dertaking in building large collection of metadata about open source projects on the

Web. Currently, FLOSSmole reports a massive data collection of more than 500,000

open source projects in its web site [ 32 ]. For code search infrastructure builders, now

it is possible to leverage FLOSSmole's project metadata to build code repositories

instead of spending an effort in implementing custom spiders and crawlers for code.

Acknowledgements The author would like to thank Joel Ossher, Otavio Lemos, Trung Ngo, Huy

Hunh, Paul Rigor, and Erik Linsted for their contributions to the Sourcerer infrastructure. The

author would like to thank Cristina Lopes and Pierre Baldi for their advice and support in making

Sourcerer successful.

Finding Source Code on the Web for Remix and Reuse

Search WWH ::

Custom Search

Home