Database Reference
In-Depth Information
to be frequent and it is not a maximal pattern. This type of pruning is particularly
effective with a depth-first strategy.
2. During support counting of the item extensions, the support of P F ( P ) is also
determined. If after support counting, P
F ( P ) turns out to be frequent, then the
entire subtree rooted at node P can be pruned. Note that the projected database
at node P (as in TreeProjection ) is used.
Although lookaheads are also used in the MaxMiner algorithm, it should be pointed
out that the effectiveness of lookaheads is maximized with a depth-first strategy.
This is true of the first of the two aforementioned strategies, in which it is checked
whether P
F ( P ) is a subset of an already existing frequent pattern. This is a because
a depth-first strategy tends to explore the itemsets in dictionary order. In dictionary
order, maximal itemsets are usually explored much earlier than most of their subsets.
For example, for a 10-itemset abcdefghij , only 9 of the 1024 subsets of the itemsets
will be explored before exploring the itemset abscdefghij . These 9 itemsets are the
immediate prefixes of the itemset. When, the longer itemsets are explored early they
become available to prune shorter itemsets.
The following information is stored at each node during the process of construction
of the lexicographic tree:
1. The itemset P at that node.
2. The set of lexicographic tree extensions at that node which are E ( P ).
3. A pointer to the projected transaction set
( Q ), where Q is some ancestor of P
(including itself). The root of the tree points to the entire transaction database.
4. A bitvector containing the information about which transactions contain the item-
set for node P as a subset. The length of this bitvector is equal to the total number
of transactions in
T
( Q ). The value of a bit for a transaction is equal to one, if the
itemset P is a subset of the transaction. Otherwise it is equal to zero. Thus, the
number of 1 bits is equal to the number of transactions in
T
( Q ) which project to
P . The bitvectors are used to make the process of support counting more efficient.
T
After all the projected transactions at a given node have been identified, then find-
ing the subtree rooted at that node is a completely independent itemset generation
problem with a substantially reduced transaction set. The number of transactions at
a node is proportional to the support at that node.
The description in Fig. 2.14 shows how the depth first creation of the lexicographic
tree is performed. The algorithm is described recursively, so that the call from each
node is a completely independent itemset generation problem that finds all frequent
itemsets that are descendants of a node. There are three parameters to the algorithm,
a pointer to the database
, the itemset node N , and the bitvector B . The bitvector
B contains one bit for each transaction in T
T
, and indicates whether or not
the transaction T should be used in finding the frequent extensions of N . A bit for
a transaction T is one, if the itemset at that node is a subset of the corresponding
transaction. The first call to the algorithm is from the null node, the parameter
T
T
is
the entire transaction database. Because each transaction in the database is relevant
to perform the counting, the bitvector B consists of all “one ” values. One property
Search WWH ::




Custom Search