Databases Reference
In-Depth Information
shut down in recent years, operating such an engine is not per se a fast-selling item,
it rather seems to be related with a considerable risk to receive a lack of interest
when content and usability are not appealing enough to potential users. The prime
example in this context is certainly the failed high-profile attempt of IBM, Microsoft
and SAP to establish the so-called UDDI Business Registry (UBR) as a marketplace
for (web) services that was finally closed down in early 2006 containing barely a few
hundred entries of dubious quality [ 16 ]. However, even operating a popular search
engine does not guarantee its long-time survival as is underlined by the recent an-
nouncement of Google to shut down its code search engine in January 2012 [ 13 ].
In spite of that, various code search engines (academic as well as commercial)
have demonstrated that the advances in database and text search technology (such
as the Lucene framework [ 14 ]) have made the creation of “internet-scale” software
repositories a viable undertaking wherefore the repository problem can be regarded
as solved. In order to conclude this subsection, the following table summarizes im-
portant characteristics of some of these second generation software search engines.
Table 12.1: Overview of code search engines and directories
Retrieval
algorithms
No. of
artifacts
Remarks
Name
Year
<
500
services
Keyword match-
ing on metadata
UDDI Bus. Reg.
2000
Shutdown in 2006
Keyword
match-
Implements Com-
ponentRank
10 5
Spars-J
2004
>
ing
10 9
LOC
>
3
·
Keyword & name
matching
Commercial
by
2004
Koders.com
Black Duck SW
Keyword match-
ing/regular expres-
sions
Shutdown
January, 15 2012
10 7
Google Codesearch
2006
>
Eclipse integration
via
Keyword & name
matching
10 6
2007
>
CodeGenie
Sourcerer
plug-in
A more comprehensive overview that demonstrates even more forcefully that top
notch software search engines today are easily able to index millions of artifacts can
be found in [ 19 ] and in another chapter of this topic [ 5 ].
12.2.2 Remaining Challenges
From the four problems identified for reuse-driven software retrieval in Sect. 12.1 ,
state of the art software search engines thus have basically solved the repository
problem and the representation problem by creating internet-scale collections of
software assets that can be managed with common databases or state of the art
search frameworks such as the freely available Lucene [ 14 ]. However, the usability
Search WWH ::




Custom Search