Database Reference
In-Depth Information
Table 2.2 Vertical
representation of transactions.
Note that the support of
itemset ab can be computed
as the length of the
intersection of the tidlists of
a and b
Item
tidlist
a
1,2,3,5
b
1,2,4,5
c
1,2,5
d
1,2,5
e
1,4,5
f
2,3,4
g
3, 4
h
2, 5
until the width of the bitstring is reduced even further by the same multiplicative
factor. This ensures that the bit strings representations are not sparse and wasteful.
The key issue here is that different representations provide different tradeoffs in
terms of memory management and efficiency. Later in this chapter, an approach
called FP-growth will be discussed which uses the trie data structure to achieve
compression of projected transactions for better memory management.
3.3
Vertical Mining Algorithms
The vertical pattern mining algorithms use a vertical representation of the transaction
database to enable more efficient counting. The basic idea of the vertical represen-
tation is that one can express the transaction database as an inverted list. In other
words, for each transaction identifiers, one can have a list of items that are contained
in it. This is referred to as a tidset or tidlist . An example of a vertical representation
of the transactions in Table 2.1 is illustrated in Table 2.2 .
The key idea in vertical pattern mining algorithms is that the support of k -patterns
can be computed by intersection of the underlying tidlists . There are two different
ways in which this can be done.
￿
The support of a k -itemset can be computed as a k -way set intersection of the lists
of the individual items.
￿
The support of a k -itemset can be computed as an intersection of the tidlists two
( k
1)-itemsets that join to that k -itemset.
The latter approach is more efficient. The credit for both the notion of vertical tidlists
and the advantages of recursive intersection of tidlists is shared by the Monet [ 56 ]
and the Partition algorithms [ 57 ]. Not all vertical pattern mining algorithms use an
enumeration tree concept to describe the algorithm. Many of the algorithms directly
use joins to generate a ( k
1)-candidate pattern from a frequent k -pattern, though
even a join-based algorithm, such as Apriori , can be explained in terms of an enumer-
ation tree. Many of the later variations of vertical methods use an enumeration tree
concept to explore the lattice of itemsets more carefully and realize the full power of
the vertical approach. The indvidual ensemble component of Savasere et al.'s [ 57 ]
Partition algorithm is the progenitor of all vertical pattern mining algorithms today,
and the original Eclat algorithm is a memory-optimized and candidate partitioned
version of this Apriori-like algorithm.
+
 
Search WWH ::




Custom Search