Databases Reference
In-Depth Information
while out 1 (0 0 ) = f initSign g), and are therefore not merged, preventing the
precision loss.
Of course, increasing the parameters k 1 and k 2 makes the abstraction more
precise, but may negatively impact convergence.
6.5 Summarization Using Statistical Approaches
The abstract trace collection produces automata that overapproximate the
actual behavior. However, the trace collection output may represent spurious
behavior due to at least three sources of noise:
Analysis Imprecision: The output of the abstract interpretation is an
over-approximation that may include behavior from infeasible paths.
Bugs in Training Corpus: Programs in the training corpus may con-
tain a (hopefully small) number of incorrect usages.
Unrestricted Methods: Some API methods (e.g., side-effect free
methods) may not play a role in the intended API specification, but
may still appear in the collected abstract traces.
To deal with unrestricted methods, one could leverage component-side
techniques to analyze the API implementation, identify side-effect-free meth-
ods, and exclude them from consideration [27]. Similarly, we could apply
component-side analysis to exclude spurious patterns which lead to viola-
tions of simple safety or liveness properties. We elide further discussion of
such techniques as they fall outside the scope of this work, which focuses on
client-side techniques.
To deal with the other sources of noise, we turn to statistical techniques
inspired by approaches such as z-ranking [9] and the ranking of [30]. Statistical
techniques distinguish signal from noise according to sample frequency. A
crucial factor concerns what relative weight to assign to each sample.
We observe that each static occurrence of a usage pattern represents some
thought and work by a programmer, while each dynamic occurrence typically
represents an iteration of a loop counter. We assign weights to patterns based
on a conjecture that the number of times an API usage pattern appears in the
code provides a more meaningful metric than the number of times that code
executes. 2
Most previous work on statistical trace analyses considered raw traces con-
sisting of raw event streams [2, 5] or event pairs [30, 32]. In contrast, our work
summarizes samples that already represent summarized abstract traces, rep-
2 An empirical evaluation of this conjecture falls outside the scope of this work, but would
be an interesting direction for future work.
 
Search WWH ::




Custom Search