Database Reference
In-Depth Information
Fig. 13.4 Structural
proximity
proximity
g
B
A
functions can be represented as a function of p and q for a subgraph pattern g ,as
F ( g )
f ( p ( g ), q ( g )).
If in a graph dataset, g
=
e and g often occur together, then g and g might also
occur together. Hence, likely p ( g )
q ( g ), which means similar
objective scores. This is resulted by the structural and embedding similarity between
the starting structures g
p ( g ) and q ( g )
e and g . We call it structural proximity : Neighbor branches
in the pattern search tree exhibit strong similarity not only in pattern composition,
but also in their embeddings in the graph datasets, thus having similar frequencies
and objective scores. In summary, a conceptual claim can be drawn,
g
g
F ( g )
F ( g ) .
(13.9)
According to structural proximity, it seems reasonable to skip the whole search
branch once its nearby branch is searched, since the best scores between neighbor
branches are likely similar. Here, we would like to emphasize “likely” rather than
“surely”. Based on this intuition, if the branch A in Fig. 13.4 has been searched, B
could be “leaped over” if A and B branches satisfy some similarity criterion. The
length of leap can be controlled by the frequency difference of two graphs g and
g
e . The leap condition is defined as follows.
Let I ( G , g , g
e ) be an indicator function of a graph G : I ( G , g , g
e )
=
1, for
any supergraph g
of g ,if g
g
g
e such that g
G ,
=
G ; otherwise
1, it means if a supergraph g of g has an embedding
in G , there must be an embedding of g
0. When I ( G , g , g
e )
=
e in G . For a positive dataset D + , let
e ), g
D + ( g , g
e )
={
G
|
I ( G , g , g
e )
=
1, g
G , G
D + }
.In D + ( g , g
g
and g =
g
e have the same frequency. Define + ( g , g
e ) as follows,
|
D + ( g , g
e )
|
+ ( g , g
e )
=
p ( g )
.
|
D + |
e ) is actually the maximum frequency difference that g
and g
+ ( g , g
could
have in D + . If the difference is smaller than a threshold σ , then leap,
2 + ( g , g e )
p ( g e )
2 ( g , g e )
q ( g e )
+ p ( g ) σ and
+ q ( g ) σ.
(13.10)
σ controls the leap length. The larger σ is, the faster the search is. Structural leap
search will generate an optimal pattern candidate and reduce the need for thoroughly
Search WWH ::




Custom Search