Static Specification Mining Using Automata-Based Abstractions* - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

while out 1 (0 0 ) = f initSign g), and are therefore not merged, preventing the

precision loss.

Of course, increasing the parameters k 1 and k 2 makes the abstraction more

precise, but may negatively impact convergence.

6.5 Summarization Using Statistical Approaches

The abstract trace collection produces automata that overapproximate the

actual behavior. However, the trace collection output may represent spurious

behavior due to at least three sources of noise:

Analysis Imprecision: The output of the abstract interpretation is an

over-approximation that may include behavior from infeasible paths.

Bugs in Training Corpus: Programs in the training corpus may con-

tain a (hopefully small) number of incorrect usages.

Unrestricted Methods: Some API methods (e.g., side-effect free

methods) may not play a role in the intended API specification, but

may still appear in the collected abstract traces.

To deal with unrestricted methods, one could leverage component-side

techniques to analyze the API implementation, identify side-effect-free meth-

ods, and exclude them from consideration [27]. Similarly, we could apply

component-side analysis to exclude spurious patterns which lead to viola-

tions of simple safety or liveness properties. We elide further discussion of

such techniques as they fall outside the scope of this work, which focuses on

client-side techniques.

To deal with the other sources of noise, we turn to statistical techniques

inspired by approaches such as z-ranking [9] and the ranking of [30]. Statistical

techniques distinguish signal from noise according to sample frequency. A

crucial factor concerns what relative weight to assign to each sample.

We observe that each static occurrence of a usage pattern represents some

thought and work by a programmer, while each dynamic occurrence typically

represents an iteration of a loop counter. We assign weights to patterns based

on a conjecture that the number of times an API usage pattern appears in the

code provides a more meaningful metric than the number of times that code

executes. 2

Most previous work on statistical trace analyses considered raw traces con-

sisting of raw event streams [2, 5] or event pairs [30, 32]. In contrast, our work

summarizes samples that already represent summarized abstract traces, rep-

2 An empirical evaluation of this conjecture falls outside the scope of this work, but would

be an interesting direction for future work.

Search WWH ::

Custom Search

Home