Databases Reference
In-Depth Information
Fig. 8.6 Usage similarity computation based on feature vectors
similarity calculator can suggest similar entities based on three different measures
of usage similarity. For this purpose, the similarity calculator uses the usage infor-
mation stored in SourcererDB. The similarity calculation service works based on a
feature vector representation of code entities. As shown in Fig. 8.6 for each code
entity such as the methods foo(..) and bar(..) a vector representation of used
APIs are stored, where each entry in the vector indicates usage frequency (could be
binary for certain similarity measures). For example, Fig. 8.6 shows that foo(..)
uses API a1 once and API a2 twice. Given a measure of similarity based on fea-
ture vector (for example Cosine Distance [ 23 ]), the similarity measure between two
code entities foo(..) and bar(..) can be computed (Usage_Similarity( foo(..) ,
bar(..) )). With this collection of feature vectors, for each entity a given set of top
similar entities based on API usage can be computed by choosing an appropriate
similarity function that works on feature vectors. The Structural Semantic Indexing
(SSI) technique makes use of the similarity calculation service and uses three dif-
ferent measures of similarity. Further details on similarity calculation is available
in [ 3 ]and[ 6 ].
Except the Relational Query service, all other services are HTTP-based services.
Currently three services are open to the public. A detailed description of how to use
these services is available online [ 35 ].
8.7 Tools
A number of loosely coupled tools are available in the Sourcerer infrastructure.
These tools are primarily responsible for collecting/analyzing source code and pro-
ducing the stored contents.
Code Crawler: Sourcerer consists of a multithreaded plugin-based code crawler
that can crawl the Web pages in online source code repositories. One of the chal-
lenges in designing the Code Crawler was to adapt with the changes and differ-
ences with Web pages in different Internet repositories. To address this challenge,
the crawler follows a plugin-based design. A separate plugin can be written target-
ing the crawl of a repository. This makes it possible to just update the plugin (or add
Search WWH ::




Custom Search