Database Reference
In-Depth Information
Fig. 13.1
AGM: Two
candidate patterns formed by
two chains
+
The
AGM
algorithm uses a
vertex-based candidate generation
method that in-
creases the subgraph size by one vertex in each iteration. Two size-(
k
1) frequent
subgraphs are joined only when the two graphs have the same size-
k
subgraph. Here,
graph size
means the number of vertices in a graph. The newly formed candidate
includes the common size-
k
subgraph and the additional two vertices from the two
size-(
k
+
1) patterns. Figure
13.1
depicts the two subgraphs joined by two chains.
The
FSG
algorithm adopts an
edge-based candidate generation
strategy that
increases the subgraph size by one edge in each iteration. Two size-(
k
+
+
1) patterns
are merged if and only if they share the same subgraph having
k
edges. In the
edge-
disjoint path
method [
38
], graphs are classified by the number of disjoint paths they
have, and two paths are edge-disjoint if they do not share any common edge. A
subgraph pattern with
k
+1 disjoint paths is generated by joining subgraphs with
k
disjoint paths.
The
Apriori
-based algorithms mentioned above have considerable overhead when
two size-
k
frequent subgraphs are joined to generate size-(
k
1) candidate patterns. In
order to avoid this kind of overhead, non-
Apriori
-based algorithms were developed,
most of which adopt the pattern-growth methodology, as discussed below.
+
2.3
Pattern-Growth Approach
Pattern-growth graph mining algorithms include
gSpan
by Yan and Han [
45
],
MoFa
by Borgelt and Berthold [
7
],
FFSM
by Huan et al. [
21
],
SPIN
by Huan et al. [
23
],
and
Gaston
by Nijssen and Kok [
31
]. These algorithms are inspired by
PrefixSpan
[
32
],
TreeMinerV
[
51
], and
FREQT
[
4
] in mining sequences and trees, respectively.
The pattern-growth algorithm extends a frequent graph directly by adding a new
edge, in every possible position. It does not perform expensive join operations. A
potential problem with the edge extension is that the same graph can be discovered
multiple times. The
gSpan
algorithm helps avoiding the discovery of duplicates by
introducing a
right-most extension
technique, where the only extensions take place
on the
right-most path
[
45
]. A right-most path for a given graph is the straight path
from the starting vertex
v
0
to the last vertex
v
n
, according to a depth-first search on
the graph.
Besides the frequent subgraph mining algorithms, constraint-based subgraph min-
ing algorithms have also been proposed. Mining closed graph patterns was studied
by Yan and Han [
46
]. Mining coherent subgraphs was studied by Huan et al. [
22
]. Chi
et al. proposed CMTreeMiner to mine closed and maximal frequent subtrees [
13
].
For relational graph mining, Yan et al. [
49
] developed two algorithms,
CloseCut