A Compressed Vertical Binary Algorithm for Mining Frequent Patterns - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

TID

Item-lists

Item-vectors

1

1235

11101

01111

00111

11111

2

2345

3

345

4

12345

HIL

HIV

Tid-lists

Tid-vectors

1

2

3

4

5

12 3 4 5

1

11

111

1

0

1

11

2

1

0

00

11

4

22

3

2

4

3

4

3

11 1 11

4

VTL

VTV

Fig. 3. Examples of database layouts

computing the supports of itemsets is simpler and faster with the vertical

layout (VTL or VTV) since it involves only the intersections of tid-lists or

tid-vectors.

In a general point of view, rule mining algorithms employ a combination

of a traverse strategy (breadth-first or depth-first) and a form of the database

layout. Examples of algorithms for horizontal mining with a breadth strategy

previously presented are a priori, a prioriTID, DIC [3] and with a depth strat-

egy are different version applying FP-Trees [1]. Other algorithms considering

a VTL layout are Partition, with a breadth strategy [10], and Eclat, with a

depth strategy [22].

In this chapter we evaluate a compressed form of the VTV layout, improv-

ing the performance of the itemset generation, applicable to those algorithms

with an a priori-like approach.

The problem of mining frequent itemsets was first introduced by Agrawal

et al. [2]. To achieve e cient mining frequent patterns, an antimonotonic

property of frequent itemsets, called the a priori heuristic, was formulated

in [3]. The a priori heuristic can dramatically prune candidate itemsets.

A priori is a breadth-first search algorithm, with a HIL organization, that

iteratively generates two kinds of sets: C k and L k . The set L k contains the

large itemsets of size k ( k -itemsets). Meanwhile, C k is the set of candidate

k -itemsets, representing a superset of L k . This process continues until a null

set L k is generated.

The set L k is obtained scanning the database and determining the support

for each candidate k -itemset in C k . The set C k is generated from L k− 1 with

the following procedure:

C k =

{

c

|

Join ( c,L k− 1 )

∧

Prune ( c,L k− 1 )

}

(2)

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home