Database Reference
In-Depth Information
bottle n eck
exploratory task
graph index
mi n e
select
graph classification
graph clustering
exponential pattern space
significant patterns
graph dataset
Fig. 13.2 Graph pattern application pipeline
2.6
The Computational Bottleneck
Most graph mining methods follow the combinatorial pattern enumeration paradigm.
In real world applications including bioinformatics and social network analysis, the
complete enumeration of patterns is practically infeasible. It often turns out that
the mining results, even those for closed graphs [ 46 ] or maximal graphs [ 23 ], are
explosive in size.
Figure 13.2 depicts the pipeline of graph applications built on frequent subgraphs.
In this pipeline, frequent subgraphs are mined first; then significant patterns are
selected based on user-defined objective functions for different applications. Unfor-
tunately, the potential of graph patterns is hindered by the limitation of this pipeline,
due to a scalability issue. For instance, in order to find subgraphs with the highest
statistical significance, one has to enumerate all the frequent subgraphs first, and then
calculate their p value one by one. Obviously, this two-step process is not scalable
due to the following two reasons: (1) for many objective functions, the minimum
frequency threshold has to be set very low so that none of significant patterns will
be missed—a low-frequency threshold often means an exponential pattern set and
an extremely slow mining process; and (2) there is a lot of redundancy in frequent
subgraphs; most of them are not worth computing at all. When the complete mining
results are prohibitively large, yet only the significant or representative ones are of
real interest. It is inefficient to wait forever for the mining algorithm to finish and
then apply post-processing to the huge mining result. In order to complete mining
in a limited period of time, a user usually has to sacrifice patterns' quality. In short,
the frequent subgraph mining step becomes the bottleneck of the whole pipeline in
Fig. 13.2 .
In the following discussion, we will introduce recent graph pattern mining
methods that overcome the scalability bottleneck. The first series of studies
[ 19 , 28 , 33 , 34 , 37 , 50 ] focus on mining the optimal or significant subgraphs accord-
ing to user-specified objective functions in a timely fashion by accessing only a small
subset of promising subgraphs. The second study by Hasan et al. [ 2 ] generates an
orthogonal set of graph patterns that are representative. All these studies avoid gen-
erating the complete set of frequent subgraphs while presenting only a compact set
of interesting subgraph patterns, thus solving the scalability and applicability issues.
Search WWH ::




Custom Search