Database Reference
In-Depth Information
￿
The concept of lookaheads is defined. Let F ( P ) be the set of candidate items
that might extend node P . Before counting, it is checked whether F F ( P )is
a subset of any of the frequent patterns found so far. If such is indeed the case,
then it is known that the entire subtree rooted at P is frequent, and can be pruned
from consideration (for maximal pattern mining). During counting the support of
individual item extensions of P , the support of P
F ( P ) is also determined. If
the set P
F ( P ) is frequent, then it is known that all itemsets in the entire subset
rooted at that node are frequent. Therefore, the tree does not need to be explored
further, and can be pruned.
￿
The support lower bounding trick discussed earlier can be used to quickly de-
termine patterns which are frequent without explicit counting. The counts of
extensions of nodes can be determined without counting in many cases, where
the count does not change by extending an item.
It has been shown in [ 10 ], that these simple optimizations can improve over the
original Apriori algorithm by orders of magnitude.
5.2.2
DepthProject Algorithm
The DepthProject algorithm is based on the notion of the lexicographic tree, defined
in [ 5 ]. Unlike TreeProjection , the approach aggressively explores the candidates
in a depth-first strategy both to ensure better pruning and faster counting. As in
TreeProjection , the database is recursively projected down the lexicographic tree to
ensure more efficient counting. This kind of projection ensures that the counting
information for k -candidates is reused for ( k +
1)-candidates, as in the case of
FP-growth .
For the case of the DepthProject method [ 4 ], the lexicographic tree is explored in
depth-first order to maximize the advantage of lookaheads in which entire subtrees
can be pruned because it is known that all patterns in them are frequent. The overall
pseudocode for the depth-first strategy is illustrated in Fig. 2.14 . The pseudocodes for
candidate generation and counting are not provided because they are similar to the
previously discussed algorithms. However, one important distinction in counting is
that projected databases are used for counting. This is similar to the FP-growth class
of algorithms. Note that the recursive transaction projection is particularly effective
with a depth-first strategy because a smaller number of projected databases need to
be stored along a path in the tree, as compared to the breadth of the tree.
To reduce the overhead of counting long patterns, the notion of lookaheads are
used. At any node P of the tree, let F ( P ) be its possible (candidate) item extensions.
Then, it is checked whether P
F ( P ) is frequent in two ways:
1. Before counting the support of the individual extensions of P (i.e.,
{
P
∪{
i
}
:
}
F ( P ) occurs as subset of a frequent
itemset that has already been discovered earlier during depth-first exploration. If
such is the case, then the entire subtree rooted at P is pruned because it is known
i
F ( P )
), it is checked whether P
Search WWH ::




Custom Search