Database Reference
In-Depth Information
Theorem 4.12
Given a graph G, the error bound , the diameter upper bound
D
max
, the support threshold σ and K, with probability at least
1
−
,
SpiderMine
returns a set S of top-K largest subgraphs of G such that for each P
∈
S,
|
P
sup
|≥
σ
and diam
(
P
)
D
max
.
It has been shown in [
33
] that the reasons why the spider-based algorithm could
recover long patterns efficiently are (1) Spiders reduce combinatorial complexity
in recovering long patterns, and (2) Spiders minimize the heavy cost of graph
isomorphism checking.
≤
6.2
Fusion-style Pattern Merge
While piece-wise pattern merge could to certain extent expedite the long pattern
discovery, some more aggressive solutions have also been proposed. In particular,
[
31
] has proposed a pattern-merge based algorithm for probabilistically finding large
frequent itemsets adopting a
fusion
-style approach. The task there is to efficiently
find a good approximation to the set of all the large frequent patterns, which are also
called
colossal
patterns.
As shown with the pattern lattice model, in previous mining models pattern can-
didates are examined by implicitly or explicitly traversing a search tree in either a
breadth-first or depth-first manner. When the search tree is exponential in size at
some level, such exhaustive traversal has to run with an exponential time complex-
ity. A new mining model was therefore developed in [
31
] to attack the problem. The
mining strategy,
PatternFusion
, distinguishes itself from other methods in that it is
able to fuse small frequent patterns in a fusion-style into colossal patterns, which is
even more aggressive than piece-wise patter merge approach. It avoids the pitfalls
of both breadth-first and depth-first search by applying the following concepts.
1.
Pattern-Fusion
traverses the tree in a bounded-breadth way. It always pushes
down a frontier of a bounded-size candidate pool, i.e., only a fixed number of
patterns in the current candidate pool will be used as starting nodes to go down-
wards in the pattern tree. As such, it avoids the problem of exponential search
space.
2.
Pattern-Fusion
has the capability to identify “shortcuts” whenever possible. The
growth of each pattern is not performed with one item addition, but an agglomer-
ation of multiple patterns in the pool. These shortcuts will direct
Pattern-Fusion
down the search tree much more rapidly toward the colossal patterns.
Figure
4.8
conceptualizes this mining model.
Pattern-Fusion
is based on a study on the relationship between the support set
of a colossal pattern and those of its subpatterns reveals the notion of
robustness
of
colossal patterns. Colossal patterns exhibit
robustness
in the sense that
if a small
number of items are removed from the pattern, the resulting pattern would have a
similar support set
. The larger the pattern size, the more prominent this robustness