Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

Frequent graph patterns are subgraphs that are found from a collection of graphs

or a single massive graph with a frequency no less than a user-specified support

threshold. Frequent subgraphs are useful at characterizing graph sets, discriminating

different groups of graphs, classifying and clustering graphs, and building graph in-

dices. Borgelt and Berthold [ 7 ] illustrated the discovery of active chemical structures

in an HIV-screening dataset by contrasting the support of frequent graphs between

different classes. Deshpande et al. [ 15 ] used frequent structures as features to classify

chemical compounds. Huan et al. [ 22 ] successfully applied the frequent graph mining

technique to study protein structural families. Frequent graph patterns were also used

as indexing features by Yan et al. [ 48 ] to perform fast graph search. Their method out-

performs the traditional path-based indexing approach significantly. Koyuturk et al.

[ 27 ] proposed a method to detect frequent subgraphs in biological networks, where

considerably large frequent sub-pathways in metabolic networks are observed.

In this chapter, we will first review the existing graph pattern mining methods and

identify the combinatorial explosion problem in these methods—the graph pattern

search space grows exponentially with the pattern size. It causes two serious prob-

lems: (1) the computational bottleneck, i.e., it takes very long, or even forever, for the

algorithms to complete the mining process, and (2) patterns' applicability, i.e., the

huge mining result set hinders the potential usage of graph patterns in many real-life

applications. We will then introduce scalable graph pattern mining paradigms which

mine significant subgraphs [ 19 , 28 , 33 , 34 , 37 , 50 ], representative subgraphs [ 2 ] and

dense subgraphs [ 10 - 12 , 18 , 36 , 39 , 41 , 42 , 44 , 52 ]. In addition, we also introduce

the state-of-the-art graph mining algorithms under new application settings, such as

in a graph stream [ 1 , 3 , 5 , 6 ] and in uncertain graphs [ 26 , 53 ].

Frequent Subgraph Mining

2.1

Problem Definition

The vertex set of a graph g is denoted by V ( g ) and the edge set by E ( g ). A label

function, l , maps a vertex or an edge to a label. A graph g is a subgraph of another

graph g if there exists a subgraph isomorphism from g to g , denoted by g

g . g

⊆

is called a supergraph of g .

Definition 13.1 (Subgraph Isomorphism) For two labeled graphs g and g ,a

subgraph isomorphism is an injective function f : V ( g )

V ( g ) , s.t., (1),

→

∀

∈

l ( f ( v )) ; and (2),

V ( g ), l ( v )

∀

( u , v )

∈

E ( g ), ( f ( u ),

l ( f ( u ), f ( v )) , where l and l are the labeling functions

of g and g , respectively. f is called an embedding of g in g .

E ( g ) and l ( u , v )

f ( v ))

∈

Definition 13.2 (Frequent Graph) Given a labeled graph dataset D

G 1 , G 2 ,

... , G n }

and a subgraph g, the supporting graph set of g is D g ={

G i |

⊆

G i , G i ∈

| D g |

| D |

D }

. A frequent graph is a graph whose

support is no less than a minimum support threshold, min_sup.

. The support of g is suppor t ( g )

Frequent Pattern Mining

Search WWH ::

Custom Search

Home