Database Reference
In-Depth Information
Algorithm 9.1 Computation of the large itemsets and the original negative border.
1: procedure NB-A PRIORI (D O , msup)
2:
F 1 G ET L ARGE I TEMS (D O , msup)
3:
for k = 2; F k1 6=?; k++ do
4:
C k A PRIORI G EN (F k1 ; msup)
5:
for each t 2 T i do
. for all transactions in D O
6:
C t subset(C k , t)
. get candidate subsets of t
7: for each candidate c 2C t do
8: c:count++
9: end for
10: end for
11: F k fc 2C k jc:count msupg
12: Bd (F D O ) fc 2C k jc:count < msupg
13: end for
14: end procedure
15: procedure G ET L ARGE I TEMS (D O , msup)
16:
for T i 2D O do
. traverse all transactions
17:
for x 2 T i do
. traverse all items in the transaction
18:
x:count++
19:
end for
20:
end for
21:
for each item x do
22:
if x:count msup then
. item x is frequent
23: x 2 F k
24: else
25:
x 2Bd (F D O )
. add this item to the negative border
26: end if
27: end for
28: end procedure
be frequent. In this algorithm, we use notation F k to refer to large k-itemsets from
the original database D O .
Having identified the original negative border, the next step is to compute the
original positive border Bd + (F D O ). Algorithm 9.2, presents a level-wise (in the
length of the large itemsets) approach to accomplish this computation. Assume that
F D O is the set of frequent itemsets identified by using Apriori. For each itemset
in F D O we associate a counter, initialized to zero. The algorithm first sorts these
itemsets in decreasing length and then for all itemsets of the same length, say k, it
identifies their (k-1) large subsets and increases their counters by one. The value of
k iterates from the length of the largest identified frequent itemset down to 1. Fi-
nally, the algorithm performs a one-pass through all the counters of the itemsets and
collects the large itemsets having a value of zero in the associated counter. These,
constitute the positive border Bd + (F D O ). Due to its very nature, this algorithm is
suitable for computing both the original and the revised positive borders. For this
reason, we use notation Bd + (F) to abstract the reference to the particular border
that is computed, namely the original or the revised border.
A way of computing the revised negative border that adheres to an exact hid-
ing solution, in which F 0 D = F D O S max , is presented in Algorithm 9.3. In this
 
Search WWH ::




Custom Search