Database Reference
In-Depth Information
6.2
Other Exploration Strategies
The advantage of tree-enumeration strategies is that they facilitate the exploration
of candidates in the tree in an arbitrary order. A method known as Pincer-Search
is proposed in [ 37 ] that combines top-down and bottom-up exploration in “pincer”
fashion to avail of the advantages of both subset and superset pruning. Two primary
observations are used in pincer search:
1. Any subset of a frequent itemset is frequent.
2. Any superset of an infrequent itemset is infrequent.
In pincer-search, top-down and bottom-up exploration are combined and irrelevant
itemsets are pruned using both observations. More details of this approach are dis-
cussed in [ 37 ]. Note that, for sparse transaction data, superset pruning is likely to be
inefficient. Other recent methods have been proposed for long pattern mining with
methods such as “leap search.” These methods are discussed in the chapter on long
pattern mining in this topic.
7
Reducing the Number of Passes
A major challenge in frequent pattern mining is when the data is disk resident. In such
cases, it is desirable to use level-wise methods to ensure that random accesses to disk
are minimized. This is the reason that most of the available algorithms use level-wise
methods, which ensure that the number of passes over the database are bounded by the
size of the longest pattern. Even so, this can be significant, when many long patterns
are present in the database. Therefore, a number of methods have been proposed in the
literature to reduce the number of passes over the data. These methods could be used
in the context of join-based algorithms, tree-based algorithms, or even other classes
of frequent pattern mining methods. These correspond to combining the level-wise
database passes, using sampling, and using a preprocess-once-query-many paradigm.
7.1
Combining Passes
The earliest work on combining passes was proposed in the original Apriori algorithm
[ 1 ]. The key idea in combing passes is that it is possible to use joins to create
candidates of higher order than ( k
+
1) in a single pass. For example, ( k
+
2)-
candidates can be created from ( k
+
1)-candidates before actual validation of the
( k
2)
can be validated together in a single pass over the data. Although such an approach
reduces the number of passes over the data, it has the downside that the number of
spurious ( k
+
1)-candidates over the data. Then, the candidates of size ( k
+
1) and ( k
+
1) candidates were not
confirmed to be frequent before they were joined. Therefore, the saving of database
+
2) candidates will be far larger because the ( k
+
Search WWH ::




Custom Search