Databases Reference
In-Depth Information
This binary representation for items, as noted Burdick et al., naturally
extends to itemsets [7]. Suppose we have an integer-list
A
X
for an itemset
X
, the integer-list
A
X∪{j}
is simply the bitwise
AND
of the integer-lists
A
X
and
I
j
. If an itemset has a single item then its integer-list is
I
j
.
The weakness of a the vertical binary representation is the memory spend-
ing, especially in sparse databases. An alternative representation is considering
only non-null integers. This compressed integer-lists could be represented as
an array
CA
of pairs
<s,B
s
>
with 1
≤
s
≤
q
and
B
s
=0.
4.2 Algorithm
CBMine is a breadth-first search algorithm with a VTV organization, consid-
ering compressed integer-lists for itemset representation.
This algorithm iteratively generates a prefix list
PL
k
. The elements of this
list have the format:
<Prefix
k−
1
,CA
Prefix
k−
1
,Suffixes
Prefix
k−
1
>
,where
Prefix
k−
1
is a (
k
-1)-itemset,
CA
Prefix
k−
1
is the corresponding compressed
integer-list, and
Suffixes
Prefix
k−
1
is the set of all su
x items
j
of
k
-itemsets
extended with the same
Prefix
k−
1
,where
j
is lexicographically greater than
every item in the prefix and the
k
-itemsets extended are frequent . This repre-
sentation not only reduces the required memory space to store the integer-lists
but also eliminates the Join step described in (3).
CBMine guarantees a significant memory reduction. Other algorithms with
VTV representation have an integer-list for each itemset, meanwhile CBMine
has an integer-list only for each equivalent class (see Fig. 7).
The Prune step (4) is optimized generating
PL
k
as a sorted list by the
prefix field and, for each element, by the su
x field.
In order to determine the support of an itemset with a compressed integer-
list
CA
, the following expression is considered:
Support
(
CA
)=
<s,B
s
>∈CA
BitCount
(
B
s
)
,
(7)
(a)
(b)
{
i
1
,
i
2
,
i
3
}
i
1
,
i
2
,
i
4
}{
i
1
,
i
2
,
i
5
}
Prefix: {
i
1
,
i
2
} ,
Suffixes: {
i
3
,
i
4,
i
5
}
123
325
.
.
.
65
1287
.
.
.
12
465
23
.
.
.
536
.
.
.
Fig. 7.
(
a
) Others algorithms with VTV representations, (
b
) CBMine