Static Specification Mining Using Automata-Based Abstractions* - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

uses a component correctly, programmers can rely on library documentation,

on trying to understand the library code (when it is available), or on code

examples of other client programs using the library.

A lot of research focused on the problem of mining specifications of li-

braries to create a higher level description of what constitutes a correct use of

the library (e.g., [1, 2, 5, 9, 10, 15, 19, 24, 30{32]). The majority of this research

has focused on dynamic specification mining, inferring specifications from ob-

served behavior of representative program runs. Dynamic approaches enjoy

the significant virtue that they learn from behavior that definitively occurs

in a run. On the flip side, dynamic approaches can learn only from available

representative runs; incomplete coverage remains a fundamental limitation.

In addition, the amount of code available for inspection vastly exceeds the

amount of code amenable to automated dynamic analysis. Dynamic analysis

requires someone to build, deploy, and set up an appropriate environment for

a program run. These tasks, dicult and time-consuming for a human, lie far

beyond the reach of today's automated technologies.

To avoid the diculties of running a program, a tool can grab code, and

apply static program analysis to approximate its behavior. For this reason,

static analysis may add value as a complement or alternative to dynamic

analysis for specification mining.

Static analyses for specification mining can be classified as component-side,

client-side, or both. A component-side approach analyzes the implementation

of an API, and uses error conditions in the library (such as throwing an

exception) or user annotations to derive a specification.

In contrast, client-side approaches examine not the implementation of an

API, but rather the ways client programs use that API. Thus, client-side

approaches can infer specifications that represent how a particular set of clients

uses a general API, rather than approximating safe behavior for all possible

clients. In practice, this is a key distinction, since a specification of non-failing

behaviors often drastically over-estimates the intended use cases.

This work addresses static analysis for client-side mining, applied to API

specifications for object-oriented libraries. The central challenge is to accu-

rately track sequences that represent typical usage patterns of the API. In

particular, the analysis must deal with three dicult issues:

Aliasing. Objects from the target API may flow through complex heap-

allocated data structures.

Unbounded Sequence Length. The sequence of events for a partic-

ular object may grow to any length; the static analysis must rely on

a suciently precise yet scalable finite abstraction of unbounded se-

quences.

Noise. The analysis will inevitably infer some spurious usage patterns,

due to either analysis imprecision or incorrect client programs. A tool

must discard spurious patterns in order to output intuitive, intended

specifications.

Mining Software Specifications: Methodologies and Applications

Search WWH ::

Custom Search

Home