Databases Reference
In-Depth Information
Figure 8.2 shows Sourcerer's relational model using an ER-diagram. It shows the
five elements of Sourcerer's relational model and a set of attributes for each of them.
Tab le 8.3 provides the details on all the attributes of the model elements. Figure 8.2
and Table 8.3 provide information on how the model elements are linked with each
other, and how the attributes in the relational model link the relational model ele-
ments with the storage model. For example, Project element's 'path' attribute links
it to the physical location defined by the storage model.
Various tools in Sourcerer make use of this information to connect the relational
information with the textual contents stored in the physical files.
Entities and Relations are the key elements of the Sourcerer's relational
model that enables code specific search capabilities. Capturing and asso-
ciating fully qualified names for code entities allows referring and look-
ing up code entities across projects using the FQNs as keys. Therefore,
FQNs for entities enables analysis of relations across projects. This led to
innovative use of structural information in code search applications such
as: (i) computing CodeRank (adaptation of Google's Pagerank algorithm
on code graph) and using it as a ranking heuristic in SCSE, (ii) and using
feature vectors made up of FQNs of used entities as a basis to compute
usage similarity for entities in SSI.
8.4.3 Index Model
The Index Model complements Sourcerer's relational model by facilitating appli-
cation of information retrieval techniques on the code entities. The index model
specifies a Document representation for each code Entity in the relational model.
A document in the index model is made up of a collection of Field s. Each field has
a name and different types of values associated with them, the most fundamental
being a collection of Term s. A term is a basic unit for search/retrieval. Terms are
extracted from various parts of an entity, and stored in a corresponding field of a
document representing a code entity.
Sourcerer's information retrieval component is based on the popular Lucene [ 41 ]
information retrieval engine. Therefore, its index model confirms to how Lucene
models its contents. More details on Lucene's contents model are available in [ 25 ].
Fields in Sourcerer's index models can be categorized into five types:
1. Fields for basic retrieval that store terms coming from various parts of a code
entity.
2. Fields for retrieval with signatures that store terms coming from method signa-
tures and also terms that indicate number of arguments a method has.
3. Fields storing metadata , for example the type of the entity, so that a search could
be limited to one or more types of entities.
Search WWH ::




Custom Search