Mining Graph Patterns - Frequent Pattern Mining - page 315

Database Reference

In-Depth Information

Fig. 13.4 Structural

proximity

proximity

g

B

A

functions can be represented as a function of p and q for a subgraph pattern g ,as

F ( g )

f ( p ( g ), q ( g )).

If in a graph dataset, g

=

e and g often occur together, then g and g might also

occur together. Hence, likely p ( g )

q ( g ), which means similar

objective scores. This is resulted by the structural and embedding similarity between

the starting structures g

p ( g ) and q ( g )

∼

∼

e and g . We call it structural proximity : Neighbor branches

in the pattern search tree exhibit strong similarity not only in pattern composition,

but also in their embeddings in the graph datasets, thus having similar frequencies

and objective scores. In summary, a conceptual claim can be drawn,

g ∼

g ⇒

F ( g )

F ( g ) .

∼

(13.9)

According to structural proximity, it seems reasonable to skip the whole search

branch once its nearby branch is searched, since the best scores between neighbor

branches are likely similar. Here, we would like to emphasize “likely” rather than

“surely”. Based on this intuition, if the branch A in Fig. 13.4 has been searched, B

could be “leaped over” if A and B branches satisfy some similarity criterion. The

length of leap can be controlled by the frequency difference of two graphs g and

g

e . The leap condition is defined as follows.

Let I ( G , g , g

e ) be an indicator function of a graph G : I ( G , g , g

e )

=

1, for

any supergraph g

of g ,if g

g

g

e such that g

⊆

G ,

∃

=

⊆

G ; otherwise

1, it means if a supergraph g of g has an embedding

in G , there must be an embedding of g

0. When I ( G , g , g

e )

=

e in G . For a positive dataset D + , let

e ), g ⊃

D + ( g , g

e )

={

G

|

I ( G , g , g

e )

=

1, g

⊆

G , G

∈

D + }

.In D + ( g , g

g

and g =

g

e have the same frequency. Define + ( g , g

e ) as follows,

− |

D + ( g , g

e )

|

+ ( g , g

e )

=

p ( g )

.

|

D + |

e ) is actually the maximum frequency difference that g

and g

+ ( g , g

could

have in D + . If the difference is smaller than a threshold σ , then leap,

2 + ( g , g e )

p ( g e )

2 − ( g , g e )

q ( g e )

+ p ( g ) ≤ σ and

+ q ( g ) ≤ σ.

(13.10)

σ controls the leap length. The larger σ is, the faster the search is. Structural leap

search will generate an optimal pattern candidate and reduce the need for thoroughly

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home