Frequent Pattern Mining Algorithms: A Survey - Frequent Pattern Mining

Database Reference

In-Depth Information

Let Q be the immediate ancestor of the itemset P in the lexicographic tree.

The set of prospective branches of a node P is defined to be those items in E ( Q )

which occur lexicographically after the node P . These are the possible frequent

lexicographic extensions of P . We denote this set by F ( P ). Thus, we have the

following relationship: E ( P )

⊆

F ( P )

⊂

E ( Q ). The value of E ( P ) in Fig. 2.5 , when

P

=

ab is

{

c , d

}

. The value of F ( P ) for P

=

ab is

{

c , d , f

}

, and for P

=

af , F ( P )

is empty.

It is important to point out that virtually all non-maximal and maximal algorithms,

starting from Apriori , can be considered enumeration-tree methods. In fact, there are

few frequent pattern mining algorithms which do not use the enumeration tree, or a

subset thereof (in maximal pattern mining) for frequent itemset generation. However,

the order of exploration of the different algorithms of the lexicographic tree is quite

different. For example, Apriori uses a breadth-first strategy, whereas other algorithms

discussed later in this chapter use a depth-first strategy. Some methods are explicit

about the relationship about the candidate generation process with the enumeration

tree, whereas others, such as Apriori , are not. For example, by examining Fig. 2.4 ,it

is evident that Apriori candidates can be generated by joining two frequent siblings of

a lexicographic tree. In fact, all candidates can be generated in an exhaustive and non-

redundant way by joining frequent siblings. For example, the two itemsets acdf h

and acdfg are siblings, because they are children of the node acdf . By joining

them, one obtains the candidate pattern acdfgh . Thus, while the Apriori algorithm

is a join-based algorithm, it can also be explained in terms of the enumeration tree.

Parts of the enumeration tree may be removed by some of the algorithms by

pruning methods. For example, the Apriori algorithm uses a levelwise pruning trick.

For maximal pattern mining the advantages gained from pruning tricks can be very

significant. Therefore, the number of candidates in the execution tree of different

algorithms is different only because of pruning optimization tricks. However, some

methods are able to achieve better counting strategies by using the structure of the

enumeration tree to avoid re-doing the counting work already done for k -candidates

to ( k

1)-candidates. Therefore, explicitly introducing the enumeration tree is helpful

because it allows a more flexible way to visualize candidate exploration strategies

than join-based methods. The explicit introduction of the enumeration tree also helps

in understanding whether the gains in different algorithms arise as a result of fewer

number of candidates, or whether they arise as a result of better counting strategies.

+

3.1

AIS Algorithm

The original AIS algorithm [ 2 ] is a simple version of the lexicographic-tree algorithm,

though it is not directly presented as such. In this approach, the tree is constructed

in levelwise fashion and the corresponding itemsets at a given level are counted

with the use of the transaction database. The algorithm does not use any specific

optimizations to improve the efficiency of the counting process. As will be discussed

later, a variety of methods can be used to further improve the efficiency of tree-based

algorithms. Thus, this is a primitive approach that explores the entire search space

with no optimization.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home