Frequent Itemsets - Mining of Massive Datasets

Database Reference

In-Depth Information

As we read baskets, we look at each item in the basket and translate its name into an in-

teger. Next, we use that integer to index into the array of counts, and we add 1 to the integer

found there.

Between the Passes of A-Priori

After the first pass, we examine the counts of the items to determine which of them are fre-

quent as singletons. It might appear surprising that many singletons are not frequent. But

remember that we set the threshold s sufficiently high that we do not get too many frequent

sets; a typical s would be 1% of the baskets. If we think about our own visits to a supermar-

ket, we surely buy certain things more than 1% of the time: perhaps milk, bread, Coke or

Pepsi, and so on. We can even believe that 1% of the customers buy diapers, even though

we may not do so. However, many of the items on the shelves are surely not bought by 1%

of the customers: Creamy Caesar Salad Dressing for example.

For the second pass of A-Priori, we create a new numbering from 1 to m for just the fre-

quent items. This table is an array indexed 1 to n , and the entry for i is either 0, if item i is

not frequent, or a unique integer in the range 1 to m if item i is frequent. We shall refer to

this table as the frequent-items table .

The Second Pass of A-Priori

During the second pass, we count all the pairs that consist of two frequent items. Recall

from Section 6.2.3 that a pair cannot be frequent unless both its members are frequent.

Thus, we miss no frequent pairs. The space required on the second pass is 2 m 2 bytes, rather

than 2 n 2 bytes, if we use the triangular-matrix method for counting. Notice that the renum-

bering of just the frequent items is necessary if we are to use a triangular matrix of the right

size. The complete set of main-memory structures used in the first and second passes is

shown in Fig. 6.3 .

Figure 6.3 Schematic of main-memory use during the two passes of the A-Priori Algorithm

Search WWH ::

Custom Search

Home