Database Reference
In-Depth Information
with item re-ordering. By post-pruning, it eliminates non-maximal patterns and re-
turns the maximal pattern set. GenMax as proposed in [ 7 ] discovers all maximal
frequent itemsets by performing a depth-first traversal of the pattern lattice with a
backtracking search strategy. In addition, GenMax also adopts a progressive focus-
ing technique to eliminate non-maximal itemsets and uses diffset propagation for
fast frequency checking.
Just like in the breadth-first case, algorithm designers adopting a depth-first
approach have identified certain enumeration order to guarantee unique pattern
generation. gSpan [ 25 ] is such an algorithm proposed for frequent graph pattern
mining.
gSpan solves the redundant pattern candidate generation problem by proposing
a right-most extension technique, which imposes an order of pattern growth by only
allowing edge extension on the right-most path [ 25 ]. A right-most path for a given
graph is the straight path from the starting vertex v 0 to the last vertex v n by a depth-
first search on the graph. To conduct a depth-first search, it is necessary to identify
first a particular node of the graph as the root, which is itself an interesting question
given a graph pattern candidate. gSpan proposes a DFS coding technique to order
all the different root selections on the same graph. The one with the minimum DFS
code is the canonical root of the graph candidate, and only this rooted graph gets
extended in the pattern growth. With all these designs, gSpan is able to guarantee
the non-redundant generation of all graph pattern candidates without compromising
the completeness of the mining result.
Other pattern-growth mining methods proposed along the years include Mafia
[ 6 ] and FP-growth [ 9 ] for itemset mining. In the sequence pattern mining frontier,
pioneers along the pattern-growth line include PrefixSpan [ 21 ], FreeSpan [ 8 ],
TreeMiner [ 28 ], SPADE [ 27 ] and FEQT [ 3 ]. For graph mining, we have MoFa
[ 5 ], FFSM [ 10 ], SPIN [ 22 ], Gaston [ 16 ] and TSMiner [ 12 ] for mining large-scale
topological structures.
For both breadth-first and depth-first approaches, the inherent computational bot-
tleneck lies in the fact that they still need to examine all the frequent pattern candidates
and apply the size constraint to finally find the long patterns. When the total number
of frequent patterns is exponentially huge, all these algorithms have difficulties in
finish mining.
5
Row Enumeration Approach
There has been another line of work which, instead of enumerating patterns explicitly
by enumerating columns, finds frequent patterns by row enumeration. We call this
body of work row enumeration approach, which are based on the following observa-
tions. While the set of algorithms like Close [ 20 ] adopt breadth-first search which
is differerent from those adopting depth-first search such as CLOSET+ [ 24 ] and
CHARM [ 29 ], one thing they have in common is that they all perform pattern enu-
meration by explicitly enumerating the feature sets, which is usually called column
Search WWH ::




Custom Search