Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 2.2 Vertical

representation of transactions.

Note that the support of

itemset ab can be computed

as the length of the

intersection of the tidlists of

a and b

Item

tidlist

a

1,2,3,5

b

1,2,4,5

c

1,2,5

d

1,2,5

e

1,4,5

f

2,3,4

g

3, 4

h

2, 5

until the width of the bitstring is reduced even further by the same multiplicative

factor. This ensures that the bit strings representations are not sparse and wasteful.

The key issue here is that different representations provide different tradeoffs in

terms of memory management and efficiency. Later in this chapter, an approach

called FP-growth will be discussed which uses the trie data structure to achieve

compression of projected transactions for better memory management.

3.3

Vertical Mining Algorithms

The vertical pattern mining algorithms use a vertical representation of the transaction

database to enable more efficient counting. The basic idea of the vertical represen-

tation is that one can express the transaction database as an inverted list. In other

words, for each transaction identifiers, one can have a list of items that are contained

in it. This is referred to as a tidset or tidlist . An example of a vertical representation

of the transactions in Table 2.1 is illustrated in Table 2.2 .

The key idea in vertical pattern mining algorithms is that the support of k -patterns

can be computed by intersection of the underlying tidlists . There are two different

ways in which this can be done.

The support of a k -itemset can be computed as a k -way set intersection of the lists

of the individual items.

The support of a k -itemset can be computed as an intersection of the tidlists two

( k

−

1)-itemsets that join to that k -itemset.

The latter approach is more efficient. The credit for both the notion of vertical tidlists

and the advantages of recursive intersection of tidlists is shared by the Monet [ 56 ]

and the Partition algorithms [ 57 ]. Not all vertical pattern mining algorithms use an

enumeration tree concept to describe the algorithm. Many of the algorithms directly

use joins to generate a ( k

1)-candidate pattern from a frequent k -pattern, though

even a join-based algorithm, such as Apriori , can be explained in terms of an enumer-

ation tree. Many of the later variations of vertical methods use an enumeration tree

concept to explore the lattice of itemsets more carefully and realize the full power of

the vertical approach. The indvidual ensemble component of Savasere et al.'s [ 57 ]

Partition algorithm is the progenitor of all vertical pattern mining algorithms today,

and the original Eclat algorithm is a memory-optimized and candidate partitioned

version of this Apriori-like algorithm.

+

Frequent Pattern Mining

Search WWH ::

Custom Search

Home