Frequent Itemsets - Mining of Massive Datasets

Databases Reference

In-Depth Information

Exercise 6.1.3 : Suppose there are 100 items, numbered 1 to 100, and also 100

baskets, also numbered 1 to 100. Item i is in basket b if and only if b divides i

with no remainder. For example, basket 12 consists of items

{12, 24, 36, 48, 60, 72, 84, 96}

Repeat Exercise 6.1.1 for this data.

! Exercise 6.1.4 : This question involves data from which nothing interesting

can be learned about frequent itemsets, because there are no sets of items that

are correlated. Suppose the items are numbered 1 to 10, and each basket is

constructed by including item i with probability 1/i, each decision being made

independently of all other decisions. That is, all the baskets contain item 1,

half contain item 2, a third contain item 3, and so on. Assume the number of

baskets is su ciently large that the baskets collectively behave as one would

expect statistically. Let the support threshold be 1% of the baskets. Find the

frequent itemsets.

Exercise 6.1.5 : For the data of Exercise 6.1.1, what is the confidence of the

following association rules?

(a){5, 7}→2.

(b){2, 3, 4}→5.

Exercise 6.1.6 : For the data of Exercise 6.1.3, what is the confidence of the

following association rules?

(a){24, 60}→8.

(b){2, 3, 4}→5.

!! Exercise 6.1.7 : Describe all the association rules that have 100% confidence

for the market-basket data of:

(a) Exercise 6.1.1.

(b) Exercise 6.1.3.

! Exercise 6.1.8 : Prove that in the data of Exercise 6.1.4 there are no interest-

ing association rules; i.e., the interest of every association rule is 0.

6.2

Market Baskets and the A-Priori Algorithm

We shall now begin a discussion of how to find frequent itemsets or information

derived from them, such as association rules with high support and confidence.

The original improvement on the obvious algorithms, known as “A-Priori,” from

which many variants have been developed, will be covered here. The next two

sections will discuss certain further improvements. Before discussing the A-

priori Algorithm itself, we begin the section with an outline of the assumptions

about how data is stored and manipulated when searching for frequent itemsets.

Search WWH ::

Custom Search

Home