Database Reference
In-Depth Information
3
Mining Significant Graph Patterns
3.1
Problem Definition
Given a graph database D
and an objective function F , a general
problem definition for mining significant graph patterns can be formulated in two
different ways: (1) find all subgraphs g such that F ( g )
={
G 1 , ... , G n }
δ where δ is a significance
threshold; or (2) find a subgraph g such that g =
argmax g F ( g ). No matter which
formulation or which objective function is used, an efficient mining algorithm shall
find significant patterns directly without exhaustively generating the whole set of
graph patterns. There are several algorithms [ 19 , 28 , 33 , 34 , 37 , 50 ] proposed with
different objective functions and pruning techniques. We are going to discuss four
recent studies: gboost [ 28 ], gPLS [ 34 ], LEAP [ 50 ] and GraphSig [ 33 ].
3.2
gboost: A Branch-and-Bound Approach
Kudo et al. [ 28 ] presented an application of boosting for classifying labeled graphs,
such as chemical compounds, natural language texts, etc. A weak classifier called
decision stump uses a subgraph as a classification feature. Then a boosting algo-
rithm repeatedly constructs multiple weak classifiers on weighted training instances.
A gain function is designed to evaluate the quality of a decision stump, i.e. ,how
many weighted training instances can be correctly classified. Then the problem of
finding the optimal decision stump in each iteration is formulated as mining an
“optimal” subgraph pattern. gboost designs a branch-and-bound mining approach
based on the gain function and integrates it into gSpan to search for the “optimal”
subgraph pattern.
A Boosting Framework gboost uses a simple classifier, decision stump , for pre-
diction according to a single feature. The subgraph-based decision stump is defined
as follows.
Definition 13.9 (Decision Stumps for Graphs) Let t and x be labeled graphs and
y
∈{±
1
}
be a class label. A decision stump classifier for graphs is given by
y ,
t
x
h t , y ( x )
=
otherwise .
y ,
t ,
The decision stumps are trained to find a rule
y
ˆ
that minimizes the error rate
L
i =
for the given training data T
= {
x i , y i }
1 ,
L
1
L
t ,
y =
arg
min
t F , y ∈{± 1 }
I ( y i =
h t , y ( x i ))
i = 1
Search WWH ::




Custom Search