Databases Reference
In-Depth Information
Exercise 6.1.3 : Suppose there are 100 items, numbered 1 to 100, and also 100
baskets, also numbered 1 to 100. Item i is in basket b if and only if b divides i
with no remainder. For example, basket 12 consists of items
{12, 24, 36, 48, 60, 72, 84, 96}
Repeat Exercise 6.1.1 for this data.
! Exercise 6.1.4 : This question involves data from which nothing interesting
can be learned about frequent itemsets, because there are no sets of items that
are correlated. Suppose the items are numbered 1 to 10, and each basket is
constructed by including item i with probability 1/i, each decision being made
independently of all other decisions. That is, all the baskets contain item 1,
half contain item 2, a third contain item 3, and so on. Assume the number of
baskets is su ciently large that the baskets collectively behave as one would
expect statistically. Let the support threshold be 1% of the baskets. Find the
frequent itemsets.
Exercise 6.1.5 : For the data of Exercise 6.1.1, what is the confidence of the
following association rules?
(a){5, 7}→2.
(b){2, 3, 4}→5.
Exercise 6.1.6 : For the data of Exercise 6.1.3, what is the confidence of the
following association rules?
(a){24, 60}→8.
(b){2, 3, 4}→5.
!! Exercise 6.1.7 : Describe all the association rules that have 100% confidence
for the market-basket data of:
(a) Exercise 6.1.1.
(b) Exercise 6.1.3.
! Exercise 6.1.8 : Prove that in the data of Exercise 6.1.4 there are no interest-
ing association rules; i.e., the interest of every association rule is 0.
6.2
Market Baskets and the A-Priori Algorithm
We shall now begin a discussion of how to find frequent itemsets or information
derived from them, such as association rules with high support and confidence.
The original improvement on the obvious algorithms, known as “A-Priori,” from
which many variants have been developed, will be covered here. The next two
sections will discuss certain further improvements. Before discussing the A-
priori Algorithm itself, we begin the section with an outline of the assumptions
about how data is stored and manipulated when searching for frequent itemsets.
Search WWH ::




Custom Search