Databases Reference
In-Depth Information
negative border nor the collection of frequent itemsets for the sample,
that are also frequent in the whole. Thus, we can give no answer at this
time and must repeat the algorithm with a new random sample.
6.4.6 Why Toivonen's Algorithm Works
Clearly Toivonen's algorithm never produces a false positive, since it only re-
ports as frequent those itemsets that have been counted and found to be frequent
in the whole. To argue that it never produces a false negative, we must show
that when no member of the negative border is frequent in the whole, then
there can be no itemset whatsoever that is:
1. Frequent in the whole, but
2. In neither the negative border nor the collection of frequent itemsets for
the sample.
Suppose the contrary. That is, there is a set S that is frequent in the
whole, but not in the negative border and not frequent in the sample. Also,
this round of Toivonen's Algorithm produced an answer, which would certainly
not include S among the frequent itemsets. By monotonicity, all subsets of S
are also frequent in the whole. Let T be a subset of S that is of the smallest
possible size among all subsets of S that are not frequent in the sample.
We claim that T must be in the negative border. Surely T meets one of the
conditions for being in the negative border: it is not frequent in the sample.
It also meets the other condition for being in the negative border: each of its
immediate subsets is frequent in the sample. For if some immediate subset of
T were not frequent in the sample, then there would be a subset of S that is
smaller than T and not frequent in the sample, contradicting our selection of T
as a subset of S that was not frequent in the sample, yet as small as any such
set.
Now we see that T is both in the negative border and frequent in the whole
dataset. Consequently, this round of Toivonen's algorithm did not produce an
answer.
6.4.7 Exercises for Section 6.4
Exercise 6.4.1 : Suppose there are eight items, A, B, . . . , H, and the following
are the maximal frequent itemsets:{A, B},{B, C},{A, C},{A, D},{E}, and
{F}. Find the negative border.
Exercise 6.4.2 : Apply Toivonen's Algorithm to the data of Exercise 6.3.1,
with a support threshold of 4. Take as the sample the first row of baskets:
{1, 2, 3},{2, 3, 4},{3, 4, 5}, and{4, 5, 6}, i.e., one-third of the file. Our scaled-
down support theshold will be 1.
(a) What are the itemsets frequent in the sample?
Search WWH ::




Custom Search