Databases Reference
In-Depth Information
setAddress(String):void;
)
In contrast to ordinary programming languages, this language also supports ad-
ditional search constraints (such as lang:java) - or in other words, issuing faceted
searches - and even special wildcards that finally make formulating pure signature-
based searches possible as well. In order to not interfere with Lucene, the dollar sign
can be used to replace all names in an interface, i.e. either Customer , getAddress
or setAddress in case of the above example, yielding $(String):void for the last
method, for instance.
5.5.2 Retrieval Heuristics
It is intuitively clear that the more complex an API-based query becomes, the
less components are likely to match it [ 32 ]. Thus, the number of retrieved results
usually drops quickly with the size of the desired API. A similar observation has
been made by Zaremski and Wing in the context of signature matching [ 11 ]. They
therefore propose so called “relaxed searches” that also allow imperfect matches in
the result set. The basic idea is to implement a signature-aware relevancy estimation
approach that boosts the relevancy of search results that are likely to better conform
to the users expectations. Consider the “matrix” example used previously: Lucene's
standard ranking algorithm would assign the highest relevancy to those artifacts
that contain the term “matrix” most often. An actual matrix implementation that
perhaps contains the term just once in its class definition will therefore usually
receive a rather low relevancy in contrast to artifacts using it multiple times. This
can be overcome by attaching certain fields (such as class or method names) to the
query with extra weight (in Lucene terminology the terms are “boosted”). In order
to avoid “overlooking” the term in the actual content when it does not appear in
the name the fields are concatenated with a Boolean OR as in the following simple
example:
content:matrix OR name:matrixˆ2
Although the term “matrix” is searched for within the whole content of the in-
dex entries, special attention is given to the name of the class since this is assigned
double importance. Hence a result file that is actually named Matrix will be auto-
matically ranked higher by Lucene than one that just contains the string “matrix”
somewhere. This approach can be easily extended with methods to -
content:matrix content:add content:multiply OR name:matrixˆ2
OR method:multiply OR method:add
Similar approaches are possible to better support camelCased search terms (e.g.
Search WWH ::




Custom Search