Database Reference
In-Depth Information
Theorem 4.12 Given a graph G, the error bound , the diameter upper bound
D max , the support threshold σ and K, with probability at least 1
, SpiderMine
returns a set S of top-K largest subgraphs of G such that for each P
S,
| P sup |≥ σ
and diam ( P )
D max .
It has been shown in [ 33 ] that the reasons why the spider-based algorithm could
recover long patterns efficiently are (1) Spiders reduce combinatorial complexity
in recovering long patterns, and (2) Spiders minimize the heavy cost of graph
isomorphism checking.
6.2
Fusion-style Pattern Merge
While piece-wise pattern merge could to certain extent expedite the long pattern
discovery, some more aggressive solutions have also been proposed. In particular,
[ 31 ] has proposed a pattern-merge based algorithm for probabilistically finding large
frequent itemsets adopting a fusion -style approach. The task there is to efficiently
find a good approximation to the set of all the large frequent patterns, which are also
called colossal patterns.
As shown with the pattern lattice model, in previous mining models pattern can-
didates are examined by implicitly or explicitly traversing a search tree in either a
breadth-first or depth-first manner. When the search tree is exponential in size at
some level, such exhaustive traversal has to run with an exponential time complex-
ity. A new mining model was therefore developed in [ 31 ] to attack the problem. The
mining strategy, PatternFusion , distinguishes itself from other methods in that it is
able to fuse small frequent patterns in a fusion-style into colossal patterns, which is
even more aggressive than piece-wise patter merge approach. It avoids the pitfalls
of both breadth-first and depth-first search by applying the following concepts.
1. Pattern-Fusion traverses the tree in a bounded-breadth way. It always pushes
down a frontier of a bounded-size candidate pool, i.e., only a fixed number of
patterns in the current candidate pool will be used as starting nodes to go down-
wards in the pattern tree. As such, it avoids the problem of exponential search
space.
2. Pattern-Fusion has the capability to identify “shortcuts” whenever possible. The
growth of each pattern is not performed with one item addition, but an agglomer-
ation of multiple patterns in the pool. These shortcuts will direct Pattern-Fusion
down the search tree much more rapidly toward the colossal patterns.
Figure 4.8 conceptualizes this mining model.
Pattern-Fusion is based on a study on the relationship between the support set
of a colossal pattern and those of its subpatterns reveals the notion of robustness of
colossal patterns. Colossal patterns exhibit robustness in the sense that if a small
number of items are removed from the pattern, the resulting pattern would have a
similar support set . The larger the pattern size, the more prominent this robustness
Search WWH ::




Custom Search