Database Reference
In-Depth Information
Algorithm 9.1 Computation of the large itemsets and the original negative border.
1: procedure NB-A
PRIORI
(D
O
, msup)
2:
F
1
G
ET
L
ARGE
I
TEMS
(D
O
, msup)
3:
for k = 2; F
k1
6=?; k++ do
4:
C
k
A
PRIORI
G
EN
(F
k1
; msup)
5:
for each t 2 T
i
do
. for all transactions in D
O
6:
C
t
subset(C
k
, t)
. get candidate subsets of t
7: for each candidate c 2C
t
do
8: c:count++
9: end for
10: end for
11: F
k
fc 2C
k
jc:count msupg
12: Bd
(F
D
O
) fc 2C
k
jc:count < msupg
13: end for
14: end procedure
15: procedure G
ET
L
ARGE
I
TEMS
(D
O
, msup)
16:
for T
i
2D
O
do
. traverse all transactions
17:
for x 2 T
i
do
. traverse all items in the transaction
18:
x:count++
19:
end for
20:
end for
21:
for each item x do
22:
if x:count msup then
. item x is frequent
23: x 2 F
k
24: else
25:
x 2Bd
(F
D
O
)
. add this item to the negative border
26: end if
27: end for
28: end procedure
be frequent. In this algorithm, we use notation F
k
to refer to large k-itemsets from
the original database D
O
.
Having identified the original negative border, the next step is to compute the
original positive border Bd
+
(F
D
O
). Algorithm 9.2, presents a level-wise (in the
length of the large itemsets) approach to accomplish this computation. Assume that
F
D
O
is the set of frequent itemsets identified by using Apriori. For each itemset
in F
D
O
we associate a counter, initialized to zero. The algorithm first sorts these
itemsets in decreasing length and then for all itemsets of the same length, say k, it
identifies their (k-1) large subsets and increases their counters by one. The value of
k iterates from the length of the largest identified frequent itemset down to 1. Fi-
nally, the algorithm performs a one-pass through all the counters of the itemsets and
collects the large itemsets having a value of zero in the associated counter. These,
constitute the positive border Bd
+
(F
D
O
). Due to its very nature, this algorithm is
suitable for computing both the original and the revised positive borders. For this
reason, we use notation Bd
+
(F) to abstract the reference to the particular border
that is computed, namely the original or the revised border.
A way of computing the revised negative border that adheres to an exact hid-
ing solution, in which F
0
D
= F
D
O
S
max
, is presented in Algorithm 9.3. In this