Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

take place during this step. The same process is repeated two more times with the

second and third order bits. At the end of three passes, each bucket contains the

support count for the appropriate itemset, where the '0' for the itemset is replaced

by a “don't care” which is represented by a '*'. Note that the number of transactions

in this example is 27. This is represented by the entry for the bucket ***. Only two

transactions contain all three items that is represented by the bucket 111.

The projection-based methods were shown to have an order of magnitude im-

provement over the MaxMiner algorithm. The depth-first approach has subsequently

been used in the context of many tree-based algorithms. Other examples of such

algorithms include those in [ 17 , 18 , 14 ]. Among these, the MAFIA algorithm [ 14 ]

is discussed in some detail in the next subsection. An approach which varies on

the projection methodology, and uses opportunistic projection is discussed in [ 38 ].

This algorithm opportunistically chooses between array-based and tree-based rep-

resentations to represent projected transaction subsets. Such an approach has been

shown to be more efficient than many state of the art methods such as the FP-Growth

method. Other variations of tree-based algorithms have also been proposed [ 70 ] that

use different strategies in tree exploration.

5.2.3

MAFIA Algorithm

The MAFIA algorithm proposed in [ 14 ] shares a number of similarities to the Depth-

Project approach, though it uses a bitmap based approach for counting, rather than

the use of a projected transaction database. In the bitmap-based approach, a sequence

of bits is maintained for each itemset that corresponds to whether or not that transac-

tion contains that particular item. Sparse representations (such as a list of transaction

identifiers) may also be used, when the fraction of transactions containing the itemset

is small. Note that such an approach may be considered a special case of database

projection [ 5 ], in which vertical projection is used but horizontal projection is not.

This has the advantage of requiring less memory, but it reuses a smaller fraction of

the counting information from higher level nodes. A number of other pruning opti-

mizations have also been proposed in this work that further improve the effectiveness

of the algorithm. In particular, it has been pointed out that when the support of the

extension of a node is the same as that of its parent, then that subtree can be pruned

away, because of the counts of all the itemsets in the subtree can be derived from those

of other itemsets in the data. This is the same as the support lower bounding trick

discussed in Sect. 2.4 , and also used in MaxMiner for pruning. Thus, the approach

in [ 14 ] uses many of the same strategies used in MaxMiner and TreeProjection ,but

with in a different combination, and with some variations on specific implementation

details.

Search WWH ::

Custom Search

Home