Databases Reference
In-Depth Information
6.3.4
Exercises for Section 6.3
Exercise 6.3.1 : Here is a collection of twelve baskets. Each contains three of
the six items 1 through 6.
{1, 2, 3}{2, 3, 4}{3, 4, 5}{4, 5, 6}
{1, 3, 5}{2, 4, 6}{1, 3, 4}{2, 4, 5}
{3, 5, 6}{1, 2, 4}{2, 3, 5}{3, 4, 6}
Suppose the support threshold is 4. On the first pass of the PCY Algorithm
we use a hash table with 11 buckets, and the set{i, j}is hashed to bucket i×j
mod 11.
(a) By any method, compute the support for each item and each pair of items.
(b) Which pairs hash to which buckets?
(c) Which buckets are frequent?
(d) Which pairs are counted on the second pass of the PCY Algorithm?
Exercise 6.3.2 : Suppose we run the Multistage Algorithm on the data of
Exercise 6.3.1, with the same support threshold of 4. The first pass is the same
as in that exercise, and for the second pass, we hash pairs to nine buckets,
using the hash function that hashes{i, j}to bucket i + j mod 9. Determine
the counts of the buckets on the second pass. Does the second pass reduce the
set of candidate pairs? Note that all items are frequent, so the only reason a
pair would not be hashed on the second pass is if it hashed to an infrequent
bucket on the first pass.
Exercise 6.3.3 : Suppose we run the Multihash Algorithm on the data of
Exercise 6.3.1. We shall use two hash tables with five buckets each. For one,
the set{i, j}, is hashed to bucket 2i+ 3j + 4 mod 5, and for the other, the set is
hashed to i + 4j mod 5. Since these hash functions are not symmetric in i and
j, order the items so that i < j when evaluating each hash function. Determine
the counts of each of the 10 buckets. How large does the support threshold
have to be for the Multistage Algorithm to eliminate more pairs than the PCY
Algorithm would, using the hash table and function described in Exercise 6.3.1?
! Exercise 6.3.4 : Suppose we perform the PCY Algorithm to find frequent
pairs, with market-basket data meeting the following specifications:
1. The support threshold is 10,000.
2. There are one million items, represented by the integers 0, 1, . . . , 999999.
3. There are 250,000 frequent items, that is, items that occur 10,000 times
or more.
Search WWH ::




Custom Search