Database Reference
In-Depth Information
Let Q be the immediate ancestor of the itemset P in the lexicographic tree.
The set of prospective branches of a node P is defined to be those items in E ( Q )
which occur lexicographically after the node P . These are the possible frequent
lexicographic extensions of P . We denote this set by F ( P ). Thus, we have the
following relationship: E ( P )
F ( P )
E ( Q ). The value of E ( P ) in Fig. 2.5 , when
P
=
ab is
{
c , d
}
. The value of F ( P ) for P
=
ab is
{
c , d , f
}
, and for P
=
af , F ( P )
is empty.
It is important to point out that virtually all non-maximal and maximal algorithms,
starting from Apriori , can be considered enumeration-tree methods. In fact, there are
few frequent pattern mining algorithms which do not use the enumeration tree, or a
subset thereof (in maximal pattern mining) for frequent itemset generation. However,
the order of exploration of the different algorithms of the lexicographic tree is quite
different. For example, Apriori uses a breadth-first strategy, whereas other algorithms
discussed later in this chapter use a depth-first strategy. Some methods are explicit
about the relationship about the candidate generation process with the enumeration
tree, whereas others, such as Apriori , are not. For example, by examining Fig. 2.4 ,it
is evident that Apriori candidates can be generated by joining two frequent siblings of
a lexicographic tree. In fact, all candidates can be generated in an exhaustive and non-
redundant way by joining frequent siblings. For example, the two itemsets acdf h
and acdfg are siblings, because they are children of the node acdf . By joining
them, one obtains the candidate pattern acdfgh . Thus, while the Apriori algorithm
is a join-based algorithm, it can also be explained in terms of the enumeration tree.
Parts of the enumeration tree may be removed by some of the algorithms by
pruning methods. For example, the Apriori algorithm uses a levelwise pruning trick.
For maximal pattern mining the advantages gained from pruning tricks can be very
significant. Therefore, the number of candidates in the execution tree of different
algorithms is different only because of pruning optimization tricks. However, some
methods are able to achieve better counting strategies by using the structure of the
enumeration tree to avoid re-doing the counting work already done for k -candidates
to ( k
1)-candidates. Therefore, explicitly introducing the enumeration tree is helpful
because it allows a more flexible way to visualize candidate exploration strategies
than join-based methods. The explicit introduction of the enumeration tree also helps
in understanding whether the gains in different algorithms arise as a result of fewer
number of candidates, or whether they arise as a result of better counting strategies.
+
3.1
AIS Algorithm
The original AIS algorithm [ 2 ] is a simple version of the lexicographic-tree algorithm,
though it is not directly presented as such. In this approach, the tree is constructed
in levelwise fashion and the corresponding itemsets at a given level are counted
with the use of the transaction database. The algorithm does not use any specific
optimizations to improve the efficiency of the counting process. As will be discussed
later, a variety of methods can be used to further improve the efficiency of tree-based
algorithms. Thus, this is a primitive approach that explores the entire search space
with no optimization.
Search WWH ::




Custom Search