Database Reference
In-Depth Information
Frequent graph patterns are subgraphs that are found from a collection of graphs
or a single massive graph with a frequency no less than a user-specified support
threshold. Frequent subgraphs are useful at characterizing graph sets, discriminating
different groups of graphs, classifying and clustering graphs, and building graph in-
dices. Borgelt and Berthold [ 7 ] illustrated the discovery of active chemical structures
in an HIV-screening dataset by contrasting the support of frequent graphs between
different classes. Deshpande et al. [ 15 ] used frequent structures as features to classify
chemical compounds. Huan et al. [ 22 ] successfully applied the frequent graph mining
technique to study protein structural families. Frequent graph patterns were also used
as indexing features by Yan et al. [ 48 ] to perform fast graph search. Their method out-
performs the traditional path-based indexing approach significantly. Koyuturk et al.
[ 27 ] proposed a method to detect frequent subgraphs in biological networks, where
considerably large frequent sub-pathways in metabolic networks are observed.
In this chapter, we will first review the existing graph pattern mining methods and
identify the combinatorial explosion problem in these methods—the graph pattern
search space grows exponentially with the pattern size. It causes two serious prob-
lems: (1) the computational bottleneck, i.e., it takes very long, or even forever, for the
algorithms to complete the mining process, and (2) patterns' applicability, i.e., the
huge mining result set hinders the potential usage of graph patterns in many real-life
applications. We will then introduce scalable graph pattern mining paradigms which
mine significant subgraphs [ 19 , 28 , 33 , 34 , 37 , 50 ], representative subgraphs [ 2 ] and
dense subgraphs [ 10 - 12 , 18 , 36 , 39 , 41 , 42 , 44 , 52 ]. In addition, we also introduce
the state-of-the-art graph mining algorithms under new application settings, such as
in a graph stream [ 1 , 3 , 5 , 6 ] and in uncertain graphs [ 26 , 53 ].
2
Frequent Subgraph Mining
2.1
Problem Definition
The vertex set of a graph g is denoted by V ( g ) and the edge set by E ( g ). A label
function, l , maps a vertex or an edge to a label. A graph g is a subgraph of another
graph g if there exists a subgraph isomorphism from g to g , denoted by g
g . g
is called a supergraph of g .
Definition 13.1 (Subgraph Isomorphism) For two labeled graphs g and g ,a
subgraph isomorphism is an injective function f : V ( g )
V ( g ) , s.t., (1),
v
l ( f ( v )) ; and (2),
V ( g ), l ( v )
=
( u , v )
E ( g ), ( f ( u ),
l ( f ( u ), f ( v )) , where l and l are the labeling functions
of g and g , respectively. f is called an embedding of g in g .
E ( g ) and l ( u , v )
f ( v ))
=
={
Definition 13.2 (Frequent Graph) Given a labeled graph dataset D
G 1 , G 2 ,
... , G n }
and a subgraph g, the supporting graph set of g is D g ={
G i |
g
G i , G i
| D g |
| D |
D }
. A frequent graph is a graph whose
support is no less than a minimum support threshold, min_sup.
. The support of g is suppor t ( g )
=
Search WWH ::




Custom Search