Database Reference
In-Depth Information
Also notice that the benefit of eliminating infrequent items is amplified; if only half the
items are frequent we need one quarter of the space to count. Likewise, if we use the triples
method, we need to count only those pairs of two frequent items that occur in at least one
basket.
The mechanics of the second pass are as follows.
(1) For each basket, look in the frequent-items table to see which of its items are frequent.
(2) In a double loop, generate all pairs of frequent items in that basket.
(3) For each such pair, add one to its count in the data structure used to store counts.
Finally, at the end of the second pass, examine the structure of counts to determine which
pairs are frequent.
6.2.6
A-Priori for All Frequent Itemsets
The same technique used for finding frequent pairs without counting all pairs lets us find
larger frequent itemsets without an exhaustive count of all sets. In the A-Priori Algorithm,
one pass is taken for each set-size k . If no frequent itemsets of a certain size are found, then
monotonicity tells us there can be no larger frequent itemsets, so we can stop.
The pattern of moving from one size k to the next size k +1 can be summarized as follows.
For each size k , there are two sets of itemsets:
(1) C k is the set of candidate itemsets of size k - the itemsets that we must count in order
to determine whether they are in fact frequent.
(2) L k is the set of truly frequent itemsets of size k .
The pattern of moving from one set to the next and one size to the next is suggested by Fig.
6.4 .
Figure 6.4 The A-Priori Algorithm alternates between constructing candidate sets and filtering to find those that are
truly frequent
We start with C 1 , which is all singleton itemsets, i.e., the items themselves. That is, be-
fore we examine the data, any item could be frequent as far as we know. The first filter step
is to count all items, and those whose counts are at least the support threshold s form the
set L 1 of frequent items.
Search WWH ::




Custom Search