Database Reference
In-Depth Information
1. Stage I: Mining Spiders
Mine all r -spiders from the input graph G . By the end of this stage, all the frequent
patterns up to a diameter 2 r with all their embeddings in G are obtained.
2. Stage II: Large Pattern Identification
Randomly pick M spiders from all the spiders obtained in Stage I as the initial set
of frequent subgraphs. The next step consists of
D max
2 r iterations. In each iteration,
use SpiderGrow() to grow each of the M subgraphs by extending its boundary
with selected spiders such that the radius of the subgraph is increased by r . In each
iteration, if it is detected that two frequent subgraphs, whose embeddings are all
previously disjoint, begin to overlap on some of their embeddings as a result of
growth in this iteration, the algorithm would merge them if the resulting merged
subgraph is frequent. Note that one can avoid pair-wise checking for potential
merging because all patterns grow with spiders as units and one only has to
monitor the same spiders being used by different patterns to detect overlapping.
At the end of the
D max
2 r iterations, keep only those frequent subgraphs which are
generated as a result of merging at some iteration. Let the set kept be S . The
frequent subgraphs in S are believed to be subgraphs of long patterns with high
probability.
3. Stage III: Large Pattern Recovery
With high probability, each one of the top- K long patterns now has some portion
of it as a pattern in S . To recover the full patterns, grow each subgraph in S
by SpiderGrow() until no more frequent patterns can be found. All the patterns
discovered so far are maintained in a list sorted by their size. Return the top- K
patterns.
A key question here is that, in Stage II of SpiderMine , how to choose M , the
number of initial seed spiders, to achieve the discovery of top- K largest patterns
with guaranteed probability. If more than one spider within a pattern P are chosen
in the random drawing process, it is said that P is successfully identified. Denote as
P success the probability that all the top- K largest patterns are successfully identified.
In [ 33 ], the authors show the following lemma with detailed proof sketch.
Lemma 4.11
Given a network G and a user-specified K, we have P success
1
) M K
V min
| V ( G ) |
.
V min is the minimum number of vertices in a long pattern required by users,
usually an easy lower bound that a user can specify. Now to compute M , we just
need to set 1
( M
+
1)(1
) M K
V min
| V ( G )
and solve for M . It follows that,
once the user specifies K and , we could compute M accordingly, and then if we
pick M spiders initially in the random drawing process, we are able to return the
top- K largest patterns with probability at least 1
( M
+
1)(1
=
1
|
. For example, with
=
0 . 1,
| V ( G )
|
K
=
10, and V min =
, we get M
=
85, which means to return top 10 largest
10
patterns(each of size at least | V ( G ) |
10 if any) with probability at least 90 %, we need to
randomly draw 85 spiders initially. With the analysis above, it is not hard to prove
the following theorem.
 
Search WWH ::




Custom Search