Frequent Itemsets - Mining of Massive Datasets

Databases Reference

In-Depth Information

1

2

1

2

1

2

Item

names

to

integers

Item

names

to

integers

Item

names

to

integers

Fre−

Item

quent

counts

items

n

Bitmap 1

Bitmap 2

Second

hash table

Hash table

Data strucrure

for counts

of pairs

for bucket

counts

Pass 1

Pass 2

Pass 3

Figure 6.6: The Multistage Algorithm uses additional hash tables to reduce the

number of candidate pairs

the first pass. However, we must retain the information about which items are

frequent, since we need it on both the second and third passes. During the

second pass, we hash certain pairs of items to buckets of the second hash table.

A pair is hashed only if it meets the two criteria for being counted in the second

pass of PCY; that is, we hash{i, j}if and only if i and j are both frequent,

and the pair hashed to a frequent bucket on the first pass. As a result, the sum

of the counts in the second hash table should be significantly less than the sum

for the first pass. The result is that, even though the second hash table has

only 31/32 of the number of buckets that the first table has, we expect there

to be many fewer frequent buckets in the second hash table than in the first.

After the second pass, the second hash table is also summarized as a bitmap,

and that bitmap is stored in main memory. The two bitmaps together take up

slightly less than 1/16th of the available main memory, so there is still plenty

of space to count the candidate pairs on the third pass. A pair{i, j}is in C 2 if

and only if:

1. i and j are both frequent items.

2.{i, j}hashed to a frequent bucket in the first hash table.

3.{i, j}hashed to a frequent bucket in the second hash table.

The third condition is the distinction between Multistage and PCY.

It might be obvious that it is possible to insert any number of passes between

the first and last in the multistage Algorithm. There is a limiting factor that

Mining of Massive Datasets

Search WWH ::

Custom Search

Home