Databases Reference
In-Depth Information
3. For each frequent pair, add one to its count in the data structure used to
store counts.
Finally, at the end of the second pass, examine the structure of counts to
determine which pairs are frequent.
6.2.6 A-Priori for All Frequent Itemsets
The same technique used for finding frequent pairs without counting all pairs
lets us find larger frequent itemsets without an exhaustive count of all sets. In
the A-Priori Algorithm, one pass is taken for each set-size k. If no frequent
itemsets of a certain size are found, then monotonicity tells us there can be no
larger frequent itemsets, so we can stop.
The pattern of moving from one size k to the next size k + 1 can be sum-
marized as follows. For each size k, there are two sets of itemsets:
1. C k is the set of candidate itemsets of size k - the itemsets that we must
count in order to determine whether they are in fact frequent.
2. L k is the set of truly frequent itemsets of size k.
The pattern of moving from one set to the next and one size to the next is
suggested by Fig. 6.4.
C
L
C
L
C
L
1
1
2
2
3
3
. . .
Filter
Construct
Filter
Construct
Filter
Construct
Pairs of
frequent
items
All
items
Frequent
Frequent
items
pairs
Figure 6.4: The A-Priori Algorithm alternates between constructing candidate
sets and filtering to find those that are truly frequent
We start with C 1 , which is all singleton itemsets, i.e., the items themselves.
That is, before we examine the data, any item could be frequent as far as we
know. The first filter step is to count all items, and those whose counts are at
least the support threshold s form the set L 1 of frequent items.
The set C 2 of candidate pairs is the set of pairs both of whose items are
in L 1 ; that is, they are frequent items. Note that we do not construct C 2
explicitly. Rather we use the definition of C 2 , and we test membership in C 2 by
testing whether both of its members are in L 1 . The second pass of the A-Priori
Algorithm counts all the candidate pairs and determines which appear at least
s times. These pairs form L 2 , the frequent pairs.
We can follow this pattern as far as we wish. The set C 3 of candidate
triples is constructed (implicitly) as the set of triples, any two of which is a
pair in L 2 . Our assumption about the sparsity of frequent itemsets, outlined
Search WWH ::




Custom Search