A Compressed Vertical Binary Algorithm for Mining Frequent Patterns - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Fig. 4. Pseudocode of a priori Algorithm

where:

Join (

{

i 1 ,...,i k }

,L k− 1 )

≡{

i 1 ,..,i k− 2 ,i k− 1 }∈

L k− 1

(3)

∧{

i 1 ,..,i k− 2 ,i k }∈

L k− 1

,

Prune ( c,L k− 1 )

≡∀

s [ s

⊂

c ]

∧|

s

|

= k

−

1

→

s

∈

L k− 1

.

(4)

Observe that the Join step (3) takes two ( k -1)-itemsets of a same equiva-

lence class to generate a k -itemsets.

The a priori algorithm is presented in Fig. 4.

The procedure a priori gen used in step 3 is described in (2).

3 Related Work

The vertical binary representations (VTV) and the corresponding support

counting method have been investigated by other researchers [7, 8, 10, 12, 22].

Zaki et al. proposed several algorithms using vertical binary representa-

tions in 1997 [22]. Their improvements are obtained clustering the database

and applying an a priori-like method with simple tid-vectors.

Gardarin et al. proposed two breadth-first search algorithms using vertical

binary representations named N-BM and H-BM in 1998 [8]. N-BM consid-

ers simple (uncompressed) vertical binary representations for itemsets. Mean-

while, H-BM uses, besides, an auxiliary bit-vector, where each bit represents

a group of bits of the original bit-vector. In order to save the memory, every

1-itemset has both, while every large itemset keeps only the auxiliary bit-

vector. H-BM first performs the AND between auxiliary bit-vectors and only

non-zero groups are considered to the final count. However, as in large itemsets

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home