Databases Reference
In-Depth Information
duces imprecision but retains the soundness of analysis. Infeasible traces might
occur (being incomplete) because of data-flow insensitivity of the PDMC pro-
cess, but all the program traces that put the FSM in its final state are reported
(being sound). Since determining if T\B = is undecidable, no tool can be
sound and complete at the same time. Consequently, there could be some
infeasible API sequences in the database being fed to the FCPO miner. For
example, in Figure 5.1, the variable i might never assume value 1; therefore,
the trace a ! f ! e ! c is infeasible in the program. Also, along some feasi-
ble paths, the implicit API ordering rules might be violated and APIs could
be used incorrectly (producing buggy traces with actual errors). Hence the
API sequence database might contain certain wrong API sequences. However,
we assume that most programs that we analyze are well written. Hence, we
expect only few feasible paths to be buggy, if at all. We expect to handle
infeasible and buggy traces by selecting an appropriate min sup value. The
traces generated by PDMC with Triggers can still be of infinite size in the
presence of loops. We address this problem in Section 5.2.3.
5.2.2.5
Scenario Extraction
A single static trace from the model checker might involve several API
usage scenarios, being often interspersed. We have to separate different usage
scenarios from a given trace, so that each scenario can be fed separately to
the miner. A naive algorithm for scenario extraction would be to remove all
duplicate APIs in a given trace and feed the resulting API sequence as a
single scenario to the miner. But most traces have multiple scenarios around
the same set of APIs. Furthermore, these different scenarios represent different
usage patterns among the API set. The naive algorithm of deleting duplicates
leads to loss of API ordering information and a drastic decrease in the number
of scenarios fed to the miner.
We develop a refined scenario-extraction algorithm based on identifying
producer-consumer chains (PC-chain; return value produced by one API is
consumed as an input parameter by another API to form a chain) among
APIs in the trace. The algorithm (henceforth called the PC-Chain algorithm)
is based on the assumption that related APIs have some form of data depen-
dencies between them in the form of shared variables. In short, the PC-Chain
algorithm first identifies PC-chains among APIs in traces and outputs them as
scenarios. Isolated partial orders are then constructed among APIs in related
PC-chains. Finally, partial orders are computed between heads of PC-chains,
and these partial orders form partial order clusters. As an example, consider
three sets of APIs (a;b;c;d), (e;f;g), and (h;i;j). The rst API in each set
produces a value that is consumed by the remaining APIs in the set. Fig-
ure 5.7(a) shows two traces produced by the model checker. The APIs are all
interspersed and there are three scenarios in each trace. The arrows in Fig-
ure 5.7(a) show the PC-chains among related APIs. Figure 5.7(b) shows six
different scenarios extracted from two traces. Figure 5.7(c) summarizes the
 
Search WWH ::




Custom Search