Databases Reference
In-Depth Information
by sup(R), is the number of strings in SDB that support R. Given a minimum
support threshold min sup, a partial order R is frequent if sup(R) min sup.
Following the related definitions and the order containment relation, we have
the following result: for a string database SDB and partial orders R and R 0
such that R 0 R, we have sup(R 0 ) sup(R). Therefore, if R is frequent, then
R 0 is also frequent. To avoid the triviality, instead of reporting all frequent
partial orders, we can mine the representative ones only.
Let us consider the program traces in Figure 5.1 again. The four sequential
patterns can be regarded as frequent partial orders, which are supported by
Traces 2, 3, 4, and 5. As discussed before, given that the partial order R is also
supported by Strings 2, 3, 4, and 5, the four sequential patterns as frequent
partial orders are redundant. There does not exist another partial order R 0
such that R 0 is stronger than R in Figure 5.1 and is also supported by Strings
2, 3, 4, and 5. In other words, R is the strongest one among all frequent partial
orders supported by Strings 2, 3, 4, and 5. Thus, the partial order R is not
redundant and can be used as the representative of the frequent partial orders
supported by Strings 2, 3, 4, and 5. Technically, R is a frequent closed partial
order.
A partial order R is closed in a string database SDB if there exists no
partial order R 0 R such that sup(R) = sup(R 0 ). A partial order R is a
frequent closed partial order if it is both frequent and closed. We next formalize
the process of mining FCPOs from program traces.
5.2.2.3 Formalizing FCPO Mining from Program Traces
Informally, our framework mines FCPOs for the APIs specified by the
user from the program source code. Our framework addresses the following
problems:
Generating sequences of API invocations along different program paths.
These sequences are stored as a string multiset database. However, gen-
erating all traces along all execution paths is an uncomputable problem
and a trace can be of infinite size.
Finding the complete set of frequent closed partial orders from the
API sequence database with respect to a minimum support threshold
min sup.
Formally, let be the set of valid program statements in the given program
source code. Let A be the set of APIs specied by the user. A trace t 2 ,
a sequence of statements executed by a path p, is feasible if path p is feasible
in the program. Let T be the set of all feasible traces in the program. To
simplify the definitions, let us assume that all APIs in A are empty methods,
do not take any arguments, and return void . 3
For a given t2T, let A(t)2A
3 By assuming that APIs are empty methods that do not take any arguments and return
void , we restrict the program statements related to APIs from A in the program to only
 
Search WWH ::




Custom Search