Database Reference
In-Depth Information
Fig. 4.7 SpiderMine
i< D max
2r
A
B
D max
spiders within P which are initially picked in the random draw as H P . According
to the authors' observation, for any two spiders in H P , there must be a pattern
growth path such that along the path their super-patterns will be able to merge. And
SpiderMine is going to catch that as follows. Once all the spiders are picked, they
will be grown to larger patterns in λ iterations where λ will be determined by D max .
In each iteration, each spider will be grown in a procedure called SpiderGrow() ,
which always expands the current pattern by appending spiders to its boundary such
that the pattern's radius is increased by r . Also, in each iteration, two patterns will be
merged if some of their embeddings are found to overlap and the resulting merged
pattern is frequent enough. Now for any long pattern P , the following Lemma holds,
Lemma 4.10 For any pattern P with diameter upper-bound D max , let Spider-
Grow(Q) be a procedure which grows a pattern Q such that the radius of Q is
increased by r, then all patterns growing out of H P which are sub-patterns of P
must have merged into one sub-pattern of P after λ
D max
2 r
=
iterations of running
SpiderGrow(Q) .
This means that as long as one picks more than one spider within a long pattern
P in the initial random draw, i. e.,
|
H P |
> 1, one can guarantee he will not miss
P by retaining all the merged patterns. On the other hand, for smaller patterns, the
probability that more than one spider within the pattern get picked in the random
draw is much lower than that of long patterns. As such, keeping only the merged
patterns at the end of the iterations would highly likely prune away patterns that
would grow only toward small patterns. Thus after the pruning, what are left are a
small number of candidates each of which, with high probability, is a subgraph of
long patterns. We then use SpiderGrow() again to further extend these candidates
until no larger patterns can be found.
The SpiderMine algorithm works in the following three stages. An illustration is
given in Fig. 4.7 .
Search WWH ::




Custom Search