Mining Graph Patterns - Frequent Pattern Mining - page 308

Database Reference

In-Depth Information

bottle n eck

exploratory task

graph index

mi n e

select

graph classification

graph clustering

exponential pattern space

significant patterns

graph dataset

Fig. 13.2 Graph pattern application pipeline

2.6

The Computational Bottleneck

Most graph mining methods follow the combinatorial pattern enumeration paradigm.

In real world applications including bioinformatics and social network analysis, the

complete enumeration of patterns is practically infeasible. It often turns out that

the mining results, even those for closed graphs [ 46 ] or maximal graphs [ 23 ], are

explosive in size.

Figure 13.2 depicts the pipeline of graph applications built on frequent subgraphs.

In this pipeline, frequent subgraphs are mined first; then significant patterns are

selected based on user-defined objective functions for different applications. Unfor-

tunately, the potential of graph patterns is hindered by the limitation of this pipeline,

due to a scalability issue. For instance, in order to find subgraphs with the highest

statistical significance, one has to enumerate all the frequent subgraphs first, and then

calculate their p value one by one. Obviously, this two-step process is not scalable

due to the following two reasons: (1) for many objective functions, the minimum

frequency threshold has to be set very low so that none of significant patterns will

be missed—a low-frequency threshold often means an exponential pattern set and

an extremely slow mining process; and (2) there is a lot of redundancy in frequent

subgraphs; most of them are not worth computing at all. When the complete mining

results are prohibitively large, yet only the significant or representative ones are of

real interest. It is inefficient to wait forever for the mining algorithm to finish and

then apply post-processing to the huge mining result. In order to complete mining

in a limited period of time, a user usually has to sacrifice patterns' quality. In short,

the frequent subgraph mining step becomes the bottleneck of the whole pipeline in

Fig. 13.2 .

In the following discussion, we will introduce recent graph pattern mining

methods that overcome the scalability bottleneck. The first series of studies

[ 19 , 28 , 33 , 34 , 37 , 50 ] focus on mining the optimal or significant subgraphs accord-

ing to user-specified objective functions in a timely fashion by accessing only a small

subset of promising subgraphs. The second study by Hasan et al. [ 2 ] generates an

orthogonal set of graph patterns that are representative. All these studies avoid gen-

erating the complete set of frequent subgraphs while presenting only a compact set

of interesting subgraph patterns, thus solving the scalability and applicability issues.

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home