Static API Specification Mining: Exploiting Source Code Model Checking - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

duces imprecision but retains the soundness of analysis. Infeasible traces might

occur (being incomplete) because of data-flow insensitivity of the PDMC pro-

cess, but all the program traces that put the FSM in its final state are reported

(being sound). Since determining if T\B = is undecidable, no tool can be

sound and complete at the same time. Consequently, there could be some

infeasible API sequences in the database being fed to the FCPO miner. For

example, in Figure 5.1, the variable i might never assume value 1; therefore,

the trace a ! f ! e ! c is infeasible in the program. Also, along some feasi-

ble paths, the implicit API ordering rules might be violated and APIs could

be used incorrectly (producing buggy traces with actual errors). Hence the

API sequence database might contain certain wrong API sequences. However,

we assume that most programs that we analyze are well written. Hence, we

expect only few feasible paths to be buggy, if at all. We expect to handle

infeasible and buggy traces by selecting an appropriate min sup value. The

traces generated by PDMC with Triggers can still be of infinite size in the

presence of loops. We address this problem in Section 5.2.3.

5.2.2.5

Scenario Extraction

A single static trace from the model checker might involve several API

usage scenarios, being often interspersed. We have to separate different usage

scenarios from a given trace, so that each scenario can be fed separately to

the miner. A naive algorithm for scenario extraction would be to remove all

duplicate APIs in a given trace and feed the resulting API sequence as a

single scenario to the miner. But most traces have multiple scenarios around

the same set of APIs. Furthermore, these different scenarios represent different

usage patterns among the API set. The naive algorithm of deleting duplicates

leads to loss of API ordering information and a drastic decrease in the number

of scenarios fed to the miner.

We develop a refined scenario-extraction algorithm based on identifying

producer-consumer chains (PC-chain; return value produced by one API is

consumed as an input parameter by another API to form a chain) among

APIs in the trace. The algorithm (henceforth called the PC-Chain algorithm)

is based on the assumption that related APIs have some form of data depen-

dencies between them in the form of shared variables. In short, the PC-Chain

algorithm first identifies PC-chains among APIs in traces and outputs them as

scenarios. Isolated partial orders are then constructed among APIs in related

PC-chains. Finally, partial orders are computed between heads of PC-chains,

and these partial orders form partial order clusters. As an example, consider

three sets of APIs (a;b;c;d), (e;f;g), and (h;i;j). The rst API in each set

produces a value that is consumed by the remaining APIs in the set. Fig-

ure 5.7(a) shows two traces produced by the model checker. The APIs are all

interspersed and there are three scenarios in each trace. The arrows in Fig-

ure 5.7(a) show the PC-chains among related APIs. Figure 5.7(b) shows six

different scenarios extracted from two traces. Figure 5.7(c) summarizes the

Mining Software Specifications: Methodologies and Applications

Search WWH ::

Custom Search

Home