Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

6.2

Other Exploration Strategies

The advantage of tree-enumeration strategies is that they facilitate the exploration

of candidates in the tree in an arbitrary order. A method known as Pincer-Search

is proposed in [ 37 ] that combines top-down and bottom-up exploration in “pincer”

fashion to avail of the advantages of both subset and superset pruning. Two primary

observations are used in pincer search:

1. Any subset of a frequent itemset is frequent.

2. Any superset of an infrequent itemset is infrequent.

In pincer-search, top-down and bottom-up exploration are combined and irrelevant

itemsets are pruned using both observations. More details of this approach are dis-

cussed in [ 37 ]. Note that, for sparse transaction data, superset pruning is likely to be

inefficient. Other recent methods have been proposed for long pattern mining with

methods such as “leap search.” These methods are discussed in the chapter on long

pattern mining in this topic.

7

Reducing the Number of Passes

A major challenge in frequent pattern mining is when the data is disk resident. In such

cases, it is desirable to use level-wise methods to ensure that random accesses to disk

are minimized. This is the reason that most of the available algorithms use level-wise

methods, which ensure that the number of passes over the database are bounded by the

size of the longest pattern. Even so, this can be significant, when many long patterns

are present in the database. Therefore, a number of methods have been proposed in the

literature to reduce the number of passes over the data. These methods could be used

in the context of join-based algorithms, tree-based algorithms, or even other classes

of frequent pattern mining methods. These correspond to combining the level-wise

database passes, using sampling, and using a preprocess-once-query-many paradigm.

7.1

Combining Passes

The earliest work on combining passes was proposed in the original Apriori algorithm

[ 1 ]. The key idea in combing passes is that it is possible to use joins to create

candidates of higher order than ( k

+

1) in a single pass. For example, ( k

+

2)-

candidates can be created from ( k

+

1)-candidates before actual validation of the

( k

2)

can be validated together in a single pass over the data. Although such an approach

reduces the number of passes over the data, it has the downside that the number of

spurious ( k

+

1)-candidates over the data. Then, the candidates of size ( k

+

1) and ( k

+

1) candidates were not

confirmed to be frequent before they were joined. Therefore, the saving of database

+

2) candidates will be far larger because the ( k

+

Frequent Pattern Mining

Search WWH ::

Custom Search

Home