Mining Long Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

with item re-ordering. By post-pruning, it eliminates non-maximal patterns and re-

turns the maximal pattern set. GenMax as proposed in [ 7 ] discovers all maximal

frequent itemsets by performing a depth-first traversal of the pattern lattice with a

backtracking search strategy. In addition, GenMax also adopts a progressive focus-

ing technique to eliminate non-maximal itemsets and uses diffset propagation for

fast frequency checking.

Just like in the breadth-first case, algorithm designers adopting a depth-first

approach have identified certain enumeration order to guarantee unique pattern

generation. gSpan [ 25 ] is such an algorithm proposed for frequent graph pattern

mining.

gSpan solves the redundant pattern candidate generation problem by proposing

a right-most extension technique, which imposes an order of pattern growth by only

allowing edge extension on the right-most path [ 25 ]. A right-most path for a given

graph is the straight path from the starting vertex v 0 to the last vertex v n by a depth-

first search on the graph. To conduct a depth-first search, it is necessary to identify

first a particular node of the graph as the root, which is itself an interesting question

given a graph pattern candidate. gSpan proposes a DFS coding technique to order

all the different root selections on the same graph. The one with the minimum DFS

code is the canonical root of the graph candidate, and only this rooted graph gets

extended in the pattern growth. With all these designs, gSpan is able to guarantee

the non-redundant generation of all graph pattern candidates without compromising

the completeness of the mining result.

Other pattern-growth mining methods proposed along the years include Mafia

[ 6 ] and FP-growth [ 9 ] for itemset mining. In the sequence pattern mining frontier,

pioneers along the pattern-growth line include PrefixSpan [ 21 ], FreeSpan [ 8 ],

TreeMiner [ 28 ], SPADE [ 27 ] and FEQT [ 3 ]. For graph mining, we have MoFa

[ 5 ], FFSM [ 10 ], SPIN [ 22 ], Gaston [ 16 ] and TSMiner [ 12 ] for mining large-scale

topological structures.

For both breadth-first and depth-first approaches, the inherent computational bot-

tleneck lies in the fact that they still need to examine all the frequent pattern candidates

and apply the size constraint to finally find the long patterns. When the total number

of frequent patterns is exponentially huge, all these algorithms have difficulties in

finish mining.

5

Row Enumeration Approach

There has been another line of work which, instead of enumerating patterns explicitly

by enumerating columns, finds frequent patterns by row enumeration. We call this

body of work row enumeration approach, which are based on the following observa-

tions. While the set of algorithms like Close [ 20 ] adopt breadth-first search which

is differerent from those adopting depth-first search such as CLOSET+ [ 24 ] and

CHARM [ 29 ], one thing they have in common is that they all perform pattern enu-

meration by explicitly enumerating the feature sets, which is usually called column

Search WWH ::

Custom Search

Home