Databases Reference
In-Depth Information
Sourcerer's index model allows incorporating these code specific heuristics by
leveraging the semi-structured document model of Lucene. For each of the heuristics
the index model introduces a field that would store terms extracted based on the
heuristic. Each field is given an appropriate boosting value so that some heuristics
could be given higher priority (depending on the code search application). With such
an index model, a retrieval scheme for a code search application simply specifies
which fields to choose to match the user query. A different strategy to retrieve code
entities can be implemented by varying these schemes. For example, the top right
corner of Fig. 8.4 shows the code snippet for the method entity createResource
(previously shown in Fig. 8.3 ). The bottom part of Fig. 8.4 shows an index document
with five different fields capturing five different heuristics respectively. The top left
part of Fig. 8.4 shows in a tabular form, how two schemes would match the same
query create icon to the index document (and thus the method entity) differently.
Scheme 1 uses only three heuristics, compared to Scheme 2 that uses all five.
Scheme 1 looks over a limited set of terms associated with the method entity
createResource . This set only includes one of the terms create present in the
query create icon . Scheme 2 includes two more fields that makes it look over a
richer set of terms that includes both of the terms found in the query. Assuming that
all terms in query need to be matched for a document to be retrieved, Scheme 2
outperforms Scheme 1 because Scheme 2 uses additional heuristics to harvest more
meaningful words describing code entities.
Fig. 8.4 Incorporating heuristics in index model
Search WWH ::




Custom Search