Infrastructure for Building Code Search Applications for Developers - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

Fig. 8.6 Usage similarity computation based on feature vectors

similarity calculator can suggest similar entities based on three different measures

of usage similarity. For this purpose, the similarity calculator uses the usage infor-

mation stored in SourcererDB. The similarity calculation service works based on a

feature vector representation of code entities. As shown in Fig. 8.6 for each code

entity such as the methods foo(..) and bar(..) a vector representation of used

APIs are stored, where each entry in the vector indicates usage frequency (could be

binary for certain similarity measures). For example, Fig. 8.6 shows that foo(..)

uses API a1 once and API a2 twice. Given a measure of similarity based on fea-

ture vector (for example Cosine Distance [ 23 ]), the similarity measure between two

code entities foo(..) and bar(..) can be computed (Usage_Similarity( foo(..) ,

bar(..) )). With this collection of feature vectors, for each entity a given set of top

similar entities based on API usage can be computed by choosing an appropriate

similarity function that works on feature vectors. The Structural Semantic Indexing

(SSI) technique makes use of the similarity calculation service and uses three dif-

ferent measures of similarity. Further details on similarity calculation is available

in [ 3 ]and[ 6 ].

Except the Relational Query service, all other services are HTTP-based services.

Currently three services are open to the public. A detailed description of how to use

these services is available online [ 35 ].

8.7 Tools

A number of loosely coupled tools are available in the Sourcerer infrastructure.

These tools are primarily responsible for collecting/analyzing source code and pro-

ducing the stored contents.

Code Crawler: Sourcerer consists of a multithreaded plugin-based code crawler

that can crawl the Web pages in online source code repositories. One of the chal-

lenges in designing the Code Crawler was to adapt with the changes and differ-

ences with Web pages in different Internet repositories. To address this challenge,

the crawler follows a plugin-based design. A separate plugin can be written target-

ing the crawl of a repository. This makes it possible to just update the plugin (or add

Search WWH ::

Custom Search

Home