Databases Reference
In-Depth Information
Fine-grained transactions. As mentioned in Section 7.3.1, Apriori
relies on transactions that group related items together. We generally
have a choice between using coarse-grained or fine-grained transactions.
Coarse-grained transactions consist of all method calls added in a sin-
gle revision. Fine-grained transactions additionally group calls by the
access path. In Table 7.2, the coarse-grained transaction corresponding
to revision 1.23 of Baz : java is further subdivided into three fine-grained
transactions for objects o3 , list , and iter . An advantage of fine-grained
transactions is that they are smaller, and thus make mining more e-
cient. The reason for this is that the runtime heavily depends on the
size and number of frequent patterns, which are restricted by the size of
transactions. Fine-grained transactions also tend to reduce noise because
processing is restricted to a common prefix. However, we may miss pat-
terns containing calls with different prefixes, such as pattern f iterator ,
hasNext , next g in Table 7.2.
Mining method pairs. We can reduce the complexity even further if
we mine the revision repository only for method pairs instead of pat-
terns of arbitrary size. This technique has frequently been applied to
software evolution analysis and proved successful for finding evolution-
ary coupling, etc. [19,20,53]. While very common, method pairs can only
express relatively simple usage patterns.
7.3.3 Pattern Ranking
Even when filtering is applied, the Apriori algorithm yields many frequent
patterns. However, not all of them turn out to be good usage patterns in prac-
tice. Therefore, we use several ranking schemes when presenting the patterns
we discovered to the user for review.
7.3.3.1
Standard Ranking Approaches
Mining literature provides a number of standard techniques we use for pat-
tern ranking. Among them are the pattern's (1) support count, (2) condence,
and (3) strength, where the strength of a pattern is defined as following.
Definition 7.3.6 The strength of pattern p is the number of strong associa-
tion rules in R of the form pnq ) q where q p; both p and q are frequent
patterns, and q 6= ;.
For our experiments, we rank patterns lexicographically by their strength
and support count. However, for matching method pairs ha;bi we use the
product of condence values conf (a ) b)conf (b ) a) instead of the strength
because the continuous nature of the product gives a more fine-grained ranking
than the strength; the strength only takes the values of 0, 1, and 2 for pairs.
The advantage of products over sums is that pairs where both confidence
 
Search WWH ::




Custom Search