Mining API Usage Specifications via Searching Source Code from the Web - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

ample includes call sites of methods in API. These constructed queries include

key words derived based on the names of APIs. CSEs also provide additional

options such as language of API for further filtering out the code examples dur-

ing the search. For example, the search phase constructs the query \ lang:java

org.apache.regexpRE " to collect relevant code examples of the RE class pro-

vided by the Apache library [6] via Google code search (GCS). In the preceding

query, the option \ lang:java " describes that the language under consideration

is Java. GCS returns around 2000 code examples for this query. These code

examples include information that helps in mining API usage specifications

for the RE class.

Based on our experience with CSEs, we identify that the relevance of code

examples returned by CSE is primarily based on the format of the query

issued to CSEs. Without a well-formed query, CSEs can result in a high num-

ber of irrelevant code examples. For example, a basic search query for col-

lecting code examples of the fopen method via GCS is \ lang:cfopen ". GCS

returns around 752; 000 code examples for the preceding basic query. When

the query is changed to a well-formed query of the form \ lang:cfile:.c$

[ n s n *]fopen[ n s]? n ( " (GCS supports search with regular expressions), GCS

returns 689; 000 code examples. Among the top 50 returned code examples,

the number of relevant code examples was found to be doubled with the

well-formed query compared to the basic query. The relevance (or quality) of

collected code examples plays an important role in mining API usage specifi-

cations from collected code examples.

These well-formed queries can be formed by using additional features pro-

vided by CSEs. We next present the features provided by four popular CSEs

for constructing well-formed queries.

Google code search provides features to filter out search results

through additional information such as licenses, packages, and filenames.

Google code search also supports POSIX regular expressions as part of

the search query.

Koders provides features to filter search results based on licenses.

Additionally, Koders supports wild-card expressions and context-based

search such as class definition or method definition.

Krugle provides features to filter the search results based on projects

and also on contexts such as comments and function calls. Additionally,

Krugle supports a new kind of search, known as negative terms search,

that supports searching for a \<term> - <negativeterm>" and excludes

code examples including the \<negativeterm>" among the search re-

sults.

Codase provides features to conduct a search based on programming

languages and further on contexts such as method calls and method

definitions.

Search WWH ::

Custom Search

Home