Frequent Itemsets - Mining of Massive Datasets

Databases Reference

In-Depth Information

negative border nor the collection of frequent itemsets for the sample,

that are also frequent in the whole. Thus, we can give no answer at this

time and must repeat the algorithm with a new random sample.

6.4.6 Why Toivonen's Algorithm Works

Clearly Toivonen's algorithm never produces a false positive, since it only re-

ports as frequent those itemsets that have been counted and found to be frequent

in the whole. To argue that it never produces a false negative, we must show

that when no member of the negative border is frequent in the whole, then

there can be no itemset whatsoever that is:

1. Frequent in the whole, but

2. In neither the negative border nor the collection of frequent itemsets for

the sample.

Suppose the contrary. That is, there is a set S that is frequent in the

whole, but not in the negative border and not frequent in the sample. Also,

this round of Toivonen's Algorithm produced an answer, which would certainly

not include S among the frequent itemsets. By monotonicity, all subsets of S

are also frequent in the whole. Let T be a subset of S that is of the smallest

possible size among all subsets of S that are not frequent in the sample.

We claim that T must be in the negative border. Surely T meets one of the

conditions for being in the negative border: it is not frequent in the sample.

It also meets the other condition for being in the negative border: each of its

immediate subsets is frequent in the sample. For if some immediate subset of

T were not frequent in the sample, then there would be a subset of S that is

smaller than T and not frequent in the sample, contradicting our selection of T

as a subset of S that was not frequent in the sample, yet as small as any such

set.

Now we see that T is both in the negative border and frequent in the whole

dataset. Consequently, this round of Toivonen's algorithm did not produce an

answer.

6.4.7 Exercises for Section 6.4

Exercise 6.4.1 : Suppose there are eight items, A, B, . . . , H, and the following

are the maximal frequent itemsets:{A, B},{B, C},{A, C},{A, D},{E}, and

{F}. Find the negative border.

Exercise 6.4.2 : Apply Toivonen's Algorithm to the data of Exercise 6.3.1,

with a support threshold of 4. Take as the sample the first row of baskets:

{1, 2, 3},{2, 3, 4},{3, 4, 5}, and{4, 5, 6}, i.e., one-third of the file. Our scaled-

down support theshold will be 1.

(a) What are the itemsets frequent in the sample?

Search WWH ::

Custom Search

Home