Databases Reference
In-Depth Information
This binary representation for items, as noted Burdick et al., naturally
extends to itemsets [7]. Suppose we have an integer-list A X for an itemset
X , the integer-list A X∪{j} is simply the bitwise AND of the integer-lists A X
and I j . If an itemset has a single item then its integer-list is I j .
The weakness of a the vertical binary representation is the memory spend-
ing, especially in sparse databases. An alternative representation is considering
only non-null integers. This compressed integer-lists could be represented as
an array CA of pairs <s,B s > with 1
s
q and B s
=0.
4.2 Algorithm
CBMine is a breadth-first search algorithm with a VTV organization, consid-
ering compressed integer-lists for itemset representation.
This algorithm iteratively generates a prefix list PL k . The elements of this
list have the format: <Prefix k− 1 ,CA Prefix k− 1 ,Suffixes Prefix k− 1 > ,where
Prefix k− 1 is a ( k -1)-itemset, CA Prefix k− 1 is the corresponding compressed
integer-list, and Suffixes Prefix k− 1 is the set of all su x items j of k -itemsets
extended with the same Prefix k− 1 ,where j is lexicographically greater than
every item in the prefix and the k -itemsets extended are frequent . This repre-
sentation not only reduces the required memory space to store the integer-lists
but also eliminates the Join step described in (3).
CBMine guarantees a significant memory reduction. Other algorithms with
VTV representation have an integer-list for each itemset, meanwhile CBMine
has an integer-list only for each equivalent class (see Fig. 7).
The Prune step (4) is optimized generating PL k as a sorted list by the
prefix field and, for each element, by the su x field.
In order to determine the support of an itemset with a compressed integer-
list CA , the following expression is considered:
Support ( CA )=
<s,B s >∈CA
BitCount ( B s ) ,
(7)
(a)
(b)
{ i 1 , i 2 , i 3 } i 1 , i 2 , i 4 }{ i 1 , i 2 , i 5 }
Prefix: { i 1 , i 2 } ,
Suffixes: { i 3 , i 4, i 5 }
123
325
.
.
.
65
1287
.
.
.
12
465
23
.
.
.
536
.
.
.
Fig. 7. ( a ) Others algorithms with VTV representations, ( b ) CBMine
Search WWH ::




Custom Search