Database Reference
In-Depth Information
As we read baskets, we look at each item in the basket and translate its name into an in-
teger. Next, we use that integer to index into the array of counts, and we add 1 to the integer
found there.
Between the Passes of A-Priori
After the first pass, we examine the counts of the items to determine which of them are fre-
quent as singletons. It might appear surprising that many singletons are not frequent. But
remember that we set the threshold s sufficiently high that we do not get too many frequent
sets; a typical s would be 1% of the baskets. If we think about our own visits to a supermar-
ket, we surely buy certain things more than 1% of the time: perhaps milk, bread, Coke or
Pepsi, and so on. We can even believe that 1% of the customers buy diapers, even though
we may not do so. However, many of the items on the shelves are surely not bought by 1%
of the customers: Creamy Caesar Salad Dressing for example.
For the second pass of A-Priori, we create a new numbering from 1 to m for just the fre-
quent items. This table is an array indexed 1 to n , and the entry for i is either 0, if item i is
not frequent, or a unique integer in the range 1 to m if item i is frequent. We shall refer to
this table as the frequent-items table .
The Second Pass of A-Priori
During the second pass, we count all the pairs that consist of two frequent items. Recall
from Section 6.2.3 that a pair cannot be frequent unless both its members are frequent.
Thus, we miss no frequent pairs. The space required on the second pass is 2 m 2 bytes, rather
than 2 n 2 bytes, if we use the triangular-matrix method for counting. Notice that the renum-
bering of just the frequent items is necessary if we are to use a triangular matrix of the right
size. The complete set of main-memory structures used in the first and second passes is
shown in Fig. 6.3 .
Figure 6.3 Schematic of main-memory use during the two passes of the A-Priori Algorithm
Search WWH ::




Custom Search