Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

3

Mining Significant Graph Patterns

3.1

Problem Definition

Given a graph database D

and an objective function F , a general

problem definition for mining significant graph patterns can be formulated in two

different ways: (1) find all subgraphs g such that F ( g )

={

G 1 , ... , G n }

≥

δ where δ is a significance

threshold; or (2) find a subgraph g ∗ such that g ∗ =

argmax g F ( g ). No matter which

formulation or which objective function is used, an efficient mining algorithm shall

find significant patterns directly without exhaustively generating the whole set of

graph patterns. There are several algorithms [ 19 , 28 , 33 , 34 , 37 , 50 ] proposed with

different objective functions and pruning techniques. We are going to discuss four

recent studies: gboost [ 28 ], gPLS [ 34 ], LEAP [ 50 ] and GraphSig [ 33 ].

3.2

gboost: A Branch-and-Bound Approach

Kudo et al. [ 28 ] presented an application of boosting for classifying labeled graphs,

such as chemical compounds, natural language texts, etc. A weak classifier called

decision stump uses a subgraph as a classification feature. Then a boosting algo-

rithm repeatedly constructs multiple weak classifiers on weighted training instances.

A gain function is designed to evaluate the quality of a decision stump, i.e. ,how

many weighted training instances can be correctly classified. Then the problem of

finding the optimal decision stump in each iteration is formulated as mining an

“optimal” subgraph pattern. gboost designs a branch-and-bound mining approach

based on the gain function and integrates it into gSpan to search for the “optimal”

subgraph pattern.

A Boosting Framework gboost uses a simple classifier, decision stump , for pre-

diction according to a single feature. The subgraph-based decision stump is defined

as follows.

Definition 13.9 (Decision Stumps for Graphs) Let t and x be labeled graphs and

y

∈{±

1

}

be a class label. A decision stump classifier for graphs is given by

y ,

t

⊆

x

h t , y ( x )

=

otherwise .

−

y ,

t ,

The decision stumps are trained to find a rule

y

ˆ

that minimizes the error rate

L

i =

for the given training data T

= {

x i , y i }

1 ,

L

1

L

t ,

y =

arg

min

t ∈ F , y ∈{± 1 }

I ( y i =

h t , y ( x i ))

i = 1

Frequent Pattern Mining

Search WWH ::

Custom Search

Home