Applying Program Analysis to Code Retrieval - Finding Source Code on the Web for Remix and Reuse

Databases Reference

In-Depth Information

One solution to this dilemma is to alter the static analysis to accept non-

declaratively complete programs. Partial program analysis, for example, guesses

the fully qualified names of any missing types using a number of contextual clues

[ 5 , 7 ]. This allows the static analysis to function in the presence of missing types,

but can degrade its performance because of missing information. With regards to

link analysis, the result is that many links will refer to unknown types or methods.

The difficulty of partial program analysis lies in the ambiguity of most language's

import mechanisms. Take Java as an example. Unresolved single type imports are

the best case, as they contain a fully qualified name, and so can be matched to

unresolved simple names. On-demand imports, those with a * operator, do not fully

specify which types they import, instead including all types within a given package

or type. This causes it to be unclear which package an unknown name belongs to.

It could be located in the same top-level package or any package for which an on-

demand import exists.

A different solution for accommodating declaratively incomplete programs is au-

tomated dependency resolution [ 13 ]. Automated dependency resolution attempts to

automatically locate artifacts that contain the missing declarations, restoring a pro-

gram to declarative completeness. Its primary benefit with regards to link analysis

is that previously unknown referents can now be resolved, improving the fidelity of

the link graph.

The first step is to identify the names of the missing types, which is done in

much the same manner as partial program analysis. Once the names are identified,

they are then matched against a collection of candidate artifacts that might contain

the missing declarations. The goal is to identify a set of artifacts that provide all

of the missing types while including a minimal number of extra unnecessary types.

When this approach was applied to a large test set of open source programs, it was

found to double the number of declaratively complete programs.

11.6 Dependency Slicing

So far, this chapter has provided an overview of how code retrieval systems function,

and how static analysis can be used to improve them. The remainder of this chapter

will describe in detail a single application of static analysis to code retrieval. This

should provide insight into the complexities involved with integrating static analysis

into code retrieval.

The application we will focus on is dependency slicing. Dependency slicing is

designed to identify the minimal set of declarations required for a set of seed dec-

larations to compile and execute properly, and is similar to approaches used for

reducing the size of jar files [ 15 ]. The purpose of dependency slicing is to package

up the result of a search so that it can be imported into a project and immediately

reused. CodeGenie, a tool for test-driven code reuse, uses Sourcerer's dependency

slicing service to integrate search results with test cases, in order to identify results

that satisfy the test cases.

Search WWH ::

Custom Search

Home