Database Reference
In-Depth Information
even mining all of maximal subgraphs could be infeasible in some real world appli-
cations. To avoid this problem, ORIGAMI first finds a sample M
of the complete
set of maximal frequent subgraphs
.
The goal is to find a set of maximal subgraphs, M
M
, which is as diverse as possible.
To achieve this goal, ORIGAMI avoids using combinatorial enumeration to mine
maximal subgraph patterns. Instead, it adopts a random walk approach to enumerate
a diverse set of maximal subgraphs from the positive border of such maximal patterns.
The randomized mining algorithm starts with an empty pattern and iteratively adds a
random edge during each extension, until a maximal subgraph M is generated and no
more edges can be added. This process walks a random chain in the partial order of
frequent subgraphs. To extend an intermediate pattern, S k
M , it chooses a random
vertex v from which the extension will be attempted. Then a random edge e incident
on v is selected for extension. If no such edge is found, no extension is possible from
the vertex. When no vertices can have any further extension in S k , the random walk
terminates and S k =
M is the maximal graph. On the other hand, if a random edge e
is found, the other endpoint v of this edge is randomly selected. By adding the edge
e and its endpoint v , a candidate subgraph pattern S k + 1 is generated and its support
is computed. This random walk process repeats until no further extension is possible
on any vertex. Then the maximal subgraph M is returned.
Ideally, the random chain walks would cover different regions of the pattern space,
thus would produce dissimilar maximal patterns. However, in practice, this may not
be the case, since duplicate maximal subgraphs can be generated in the following
ways: (1) multiple iterations following overlapping chains, or (2) multiple iterations
following different chains but leading to the same maximal pattern. Let's consider
a maximal subgraph M of size n . Let e 1 e 2 ...e n be a sequence of random edge
extensions, corresponding to a random chain walk leading from an empty graph φ
to the maximal graph M . The probability of a particular edge sequence leading from
φ to M is given as
n
P [( e 1 e 2 ...e n )]
=
P ( e 1 )
P ( e i |
e 1 ...e i 1 ) .
(13.15)
i = 2
Let ES ( M ) denote the set of all valid edge sequences for a graph M . The
probability that a graph M is generated in a random walk is proportional to
P [( e 1 e 2 ...e n )] .
(13.16)
e 1 e 2 ...e n
ES ( M )
The probability of obtaining a specific maximal pattern depends on the number
of chains or edge sequences leading to that pattern and the size of the pattern. Ac-
cording to Eq. ( 13.15 ), as a graph grows larger, the probability of the edge sequence
becomes smaller. So this random walk approach in general favors a maximal sub-
graph of smaller size than one of larger size. To avoid generating duplicate maximal
subgraphs, a termination condition is designed based on an estimate of the collision
rate of the generated patterns. Intuitively the collision rate keeps track of the number
Search WWH ::




Custom Search