Database Reference
In-Depth Information
Fig. 13.1 AGM: Two
candidate patterns formed by
two chains
+
The AGM algorithm uses a vertex-based candidate generation method that in-
creases the subgraph size by one vertex in each iteration. Two size-( k
1) frequent
subgraphs are joined only when the two graphs have the same size- k subgraph. Here,
graph size means the number of vertices in a graph. The newly formed candidate
includes the common size- k subgraph and the additional two vertices from the two
size-( k
+
1) patterns. Figure 13.1 depicts the two subgraphs joined by two chains.
The FSG algorithm adopts an edge-based candidate generation strategy that
increases the subgraph size by one edge in each iteration. Two size-( k +
+
1) patterns
are merged if and only if they share the same subgraph having k edges. In the edge-
disjoint path method [ 38 ], graphs are classified by the number of disjoint paths they
have, and two paths are edge-disjoint if they do not share any common edge. A
subgraph pattern with k +1 disjoint paths is generated by joining subgraphs with k
disjoint paths.
The Apriori -based algorithms mentioned above have considerable overhead when
two size- k frequent subgraphs are joined to generate size-( k
1) candidate patterns. In
order to avoid this kind of overhead, non- Apriori -based algorithms were developed,
most of which adopt the pattern-growth methodology, as discussed below.
+
2.3
Pattern-Growth Approach
Pattern-growth graph mining algorithms include gSpan by Yan and Han [ 45 ], MoFa
by Borgelt and Berthold [ 7 ], FFSM by Huan et al. [ 21 ], SPIN by Huan et al. [ 23 ],
and Gaston by Nijssen and Kok [ 31 ]. These algorithms are inspired by PrefixSpan
[ 32 ], TreeMinerV [ 51 ], and FREQT [ 4 ] in mining sequences and trees, respectively.
The pattern-growth algorithm extends a frequent graph directly by adding a new
edge, in every possible position. It does not perform expensive join operations. A
potential problem with the edge extension is that the same graph can be discovered
multiple times. The gSpan algorithm helps avoiding the discovery of duplicates by
introducing a right-most extension technique, where the only extensions take place
on the right-most path [ 45 ]. A right-most path for a given graph is the straight path
from the starting vertex v 0 to the last vertex v n , according to a depth-first search on
the graph.
Besides the frequent subgraph mining algorithms, constraint-based subgraph min-
ing algorithms have also been proposed. Mining closed graph patterns was studied
by Yan and Han [ 46 ]. Mining coherent subgraphs was studied by Huan et al. [ 22 ]. Chi
et al. proposed CMTreeMiner to mine closed and maximal frequent subtrees [ 13 ].
For relational graph mining, Yan et al. [ 49 ] developed two algorithms, CloseCut
Search WWH ::




Custom Search