Database Reference
In-Depth Information
containing a particular item that also occur in
F k and also contain that item. This
implies that if an item does not appear in at least k frequent itemsets in
F k , then that
item is no longer relevant to further support counting for finding frequent patterns.
Therefore, that item can be trimmed from the transaction. This reduces the width of
the transaction, and increases the efficiency of processing. The overhead from the
data structures is significant, and most of the advantages are obtained for patterns of
smaller length such as 2-itemsets. It was pointed out in later work [ 46 , 47 , 60 ] that
the use of triangular arrays for support counting of 2-itemsets in the context of the
Apriori method is even more efficient than such an approach.
2.3
Special Tricks for 2-Itemset Counting
A number of special tricks can be used to improve the effectiveness of 2-itemset
counting. The case of 2-itemset counting is special and is often similar for the case
of join-based and tree-based algorithms. As mentioned above, one approach is to
use a triangular array that maintains the counts of the k -patterns explicitly. For each
transaction, a nested loop can be used to explore all pairs of items in the transaction
and increment the corresponding counts in the triangular array. A number of caching
tricks can be used [ 5 ] to improve data locality access during the counting process.
However, if the number of possible items are very large, this will still be a very
significant overhead because it is needed to maintain an entry for each pair of items.
This is also very wasteful, if many of the 1-items are not frequent, or some of the
2-item counts are zero. Therefore, a possible approach would be to first prune out all
the 1-items which are not frequent. It is simply not necessary to count the support
of a 2-itemset unless both of its constituent items are frequent. A hash table can
then be used to maintain the frequency counts of the corresponding 2-itemsets. As
before, the transactions are explored in a double nested loops, and all pairs of items
are hashed into the table, with the caveat, that each of the individual items must be
frequent. The set of itemsets which satisfy the support requirements are reported.
2.4
Pruning by Support Lower Bounding
Most of the pruning tricks discussed earlier prune itemsets when they are guaranteed
not meet the required support threshold. It is also possible to skip the counting process
for an itemset if the itemset is guaranteed to meet the support threshold. Of course,
the caveat here is that the exact support of that itemset will not be available, beyond
the knowledge that it meets the minimum threshold. This is sufficient in the case of
many applications.
Consider two k -itemsets A and B that have k
1 items A
B in common. Then,
the union of the items in A and B , denoted by A
B will have exactly k
+
1 items.
Then, if sup (
·
) represent the support of an itemset, then the support of A B can
Search WWH ::




Custom Search