Path-Aware Static Program Analyses for Specification Mining* - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

only maximal subsequences, i.e., it must be the case that every sequence(si) i )

in S is not a subsequence of any other sequence present in S.

For example, if the set of sequences is given by if ( a b c e ),

( a d c e ),( d a c e ),( a c d e f ),

( e f d c a ) g, a sequence miner detects ( a c e ) as a

frequently occurring subsequence. Observe that the same set of transactions

without ordering in frequent itemset mining would generate the set f a,c,

d,e g. For our application, a transaction corresponds to a call-site and the

sequence within a transaction corresponds to the sequence of procedure calls

that occurred before the call-site. Our implementation uses the Apriori-all al-

gorithm by Agrawal and Srikant [2], which is known to scale to over a million

sequences. For the example shown in Figure 9.8, we generate the specification

allocbuf lock readbuf .

9.2.5.2

The Structural Similarity Problem

In Figure 9.8, the names of the variables are the same across multiple

call-sites whereas this does not hold in real programs (as noted earlier in

this section). In other words, as discussed earlier, predicates computed along

different paths may share structural, if not syntactic, similarities. In order

to capture such similarities, a technique to determine the locations, names,

values, etc. that can be abstracted uniformly among different sets is necessary.

Consider every predicate expression as being mapped to a set of locations.

Thus, assume a set of location sets,

fL 1 = f` 11 ;` 12 ;:::` 1m 1 g;L 2 = f` 21 ;` 22 ;:::` 2m 2 g; :::;L k = f` k1 ;` k2 ;:::` 1m k gg

where L i corresponds to locations associated with predicates that reach

call-site i of procedure P. Now, for every element in Li, i , we wish to find a

corresponding element in every other L j such that the cumulative matching

of the attribute sets for such a mapping is maximal. Given three sets A;B;

and C, we say A and B match maximally, if and only if j A\B j is greater

than j A\C j or j B\C j.

Theorem 1. The maximal matching problem as stated above is NP-hard.

Proof By reduction from maximal bipartite (k;)-clique in a bipartite graph

[23, 45].

Fortunately, there are a number of heuristics that can be employed to

map locations based on semantic information available within programs. We

describe below heuristics that match the attribute sets across multiple call-

sites that we have used in our implementation.

Type: Attribute sets can be divided based on the type of variable. e.g.,

two variables, x and y with attributes [ x : f(:=, true )g] and [ y : f(>,

20)g] can never be matched.

Search WWH ::

Custom Search

Home