Database Reference
In-Depth Information
be lower bounded as follows:
sup ( A B )
sup ( A )
+ sup ( B )
sup ( A B )
(2.1)
This condition follows directly from set-theoretic considerations. Thus, the support
of ( k
1)-candidates can be lower bounded in terms of the (already computed) support
values of itemsets of length k or less. If the computed value on the right-hand side
is greater than the required minimum support, then the counting of the candidate
does not need to be performed explicitly, and therefore considerable savings can be
achieved. An example of a method which uses this kind of pruning is the Apriori_LB
method [ 10 ].
Another interesting rule is that if the support of an itemset X is the same as that
of X
+
Y , then for any superset X
X , it is the case that the support of the itemset
X is the same as that of X
Y . This rule can be shown directly as a corollary of the
equation above. This is very useful in a variety of frequent pattern mining algorithms.
For example, once the support of X
has been shown to be the same as that of
X , then, for any superset X of X , it is no longer necessary to explicitly compute
the support of X ∪{
∪{
i
}
, after the support of X has already been computed. Such
optimizations have been shown to be quite effective in the context of many frequent
pattern mining algorithms [ 13 , 51 , 17 ]. As discussed later, this trick is not exclusive
to join-based algorithms, and is often used effectively in tree-based algorithms such
as MaxMiner , and MAFIA .
i
}
2.5
Hypercube Decomposition
One feasible way to reduce the computation cost of support counting is to find support
of multiple frequent patterns at one time. LCM [ 66 ] devise a technique referred to as
hypercube decomposition in this purpose. The multiple itemsets obtained at one time,
comprise a hypercube in the itemset lattice. Suppose that P is a frequent pattern,
tidset ( P ) contains the transactions that P is part of, and tail( P ) denotes the latest
item extension to the itemset P . H ( P ) is the set of items e satisfying e>tail ( P )
and tidset ( P )
=
tidset ( P
e ). The set H ( P ) is referred to as the hypercube set.
Then, for any P
P
is frequent. The work in [ 66 ] uses this property in the candidate generation phase.
For two itemsets P and P
P )
H ( P ), tidset ( P
=
tidset ( P ) is true, and P
P , we say that P is between P and P
P if
P
P . In the phase with respect to P , we output all P between P
P
P
and P
H ( P ). This technique saves significant time in counting.
3
Tree-Based Algorithms
The tree-based algorithm is based on set-enumeration concepts. The candidates can
be explored with the use of a subgraph of the lattice of itemsets (see Fig. 2.2 ), which
is also referred to as the lexicographic tree or enumeration tree [ 5 ]. These terms will,
Search WWH ::




Custom Search