Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

to be frequent and it is not a maximal pattern. This type of pruning is particularly

effective with a depth-first strategy.

2. During support counting of the item extensions, the support of P ∪ F ( P ) is also

determined. If after support counting, P

F ( P ) turns out to be frequent, then the

entire subtree rooted at node P can be pruned. Note that the projected database

at node P (as in TreeProjection ) is used.

∪

Although lookaheads are also used in the MaxMiner algorithm, it should be pointed

out that the effectiveness of lookaheads is maximized with a depth-first strategy.

This is true of the first of the two aforementioned strategies, in which it is checked

whether P

F ( P ) is a subset of an already existing frequent pattern. This is a because

a depth-first strategy tends to explore the itemsets in dictionary order. In dictionary

order, maximal itemsets are usually explored much earlier than most of their subsets.

For example, for a 10-itemset abcdefghij , only 9 of the 1024 subsets of the itemsets

will be explored before exploring the itemset abscdefghij . These 9 itemsets are the

immediate prefixes of the itemset. When, the longer itemsets are explored early they

become available to prune shorter itemsets.

The following information is stored at each node during the process of construction

of the lexicographic tree:

∪

1. The itemset P at that node.

2. The set of lexicographic tree extensions at that node which are E ( P ).

3. A pointer to the projected transaction set

( Q ), where Q is some ancestor of P

(including itself). The root of the tree points to the entire transaction database.

4. A bitvector containing the information about which transactions contain the item-

set for node P as a subset. The length of this bitvector is equal to the total number

of transactions in

T

( Q ). The value of a bit for a transaction is equal to one, if the

itemset P is a subset of the transaction. Otherwise it is equal to zero. Thus, the

number of 1 bits is equal to the number of transactions in

T

( Q ) which project to

P . The bitvectors are used to make the process of support counting more efficient.

T

After all the projected transactions at a given node have been identified, then find-

ing the subtree rooted at that node is a completely independent itemset generation

problem with a substantially reduced transaction set. The number of transactions at

a node is proportional to the support at that node.

The description in Fig. 2.14 shows how the depth first creation of the lexicographic

tree is performed. The algorithm is described recursively, so that the call from each

node is a completely independent itemset generation problem that finds all frequent

itemsets that are descendants of a node. There are three parameters to the algorithm,

a pointer to the database

, the itemset node N , and the bitvector B . The bitvector

B contains one bit for each transaction in T

T

, and indicates whether or not

the transaction T should be used in finding the frequent extensions of N . A bit for

a transaction T is one, if the itemset at that node is a subset of the corresponding

transaction. The first call to the algorithm is from the null node, the parameter

∈ T

T

is

the entire transaction database. Because each transaction in the database is relevant

to perform the counting, the bitvector B consists of all “one ” values. One property

Frequent Pattern Mining

Search WWH ::

Custom Search

Home