Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

Kuramochi and Karypis [ 30 ] proposed two efficient algorithms that can find fre-

quent subgraphs within a large sparse graph. The first algorithm, called HSIGRAM,

follows a horizontal approach and finds frequent subgraphs in a breadth-first fashion.

The second algorithm, called VSIGRAM, follows a vertical approach and finds the

frequent subgraphs in a depth-first fashion. For the support measure defined in [ 30 ],

all possible occurrences ϕ of a pattern p in a graph g are calculated. An overlap-

graph is constructed where each occurrence ϕ corresponds to a node and there is an

edge between the nodes of ϕ and ϕ if they overlap. This is called simple overlap as

defined below.

Definition 13.6 (Simple Overlap) Given a pattern p

( V ( p ), E ( p )) , a simple

overlap of occurrences ϕ and ϕ of pattern p exists if ϕ ( E ( p ))

ϕ ( E ( p ))

The support of p is defined as the size of the maximum independent set (MIS) of

the overlap-graph. A later study [ 16 ] proved that the MIS-support is anti-monotone.

Fiedler and Borgelt [ 16 ] suggested a definition that relies on the non-existence

of equivalent ancestor embeddings in order to guarantee that the resulting support

is anti-monotone. The support is called harmful overlap support . The basic idea of

this measure is that some of the simple overlaps (in [ 30 ]) can be disregarded without

harming the anti-monotonicity of the support measure. As in [ 30 ], an overlap graph

is constructed and the support is defined as the size of the MIS. The major difference

is the definition of the overlap.

Definition 13.7 (Harmful Overlap) Given a pattern p

∩

= ∅

( V ( p ), E ( p )) , a harmful

overlap of occurrences ϕ and ϕ

V ( p ): ϕ ( v ), ϕ ( v )

of pattern p exists if

∃

∈

∩ ϕ ( V ( p )) .

Bringmann and Nijssen [ 8 ] examined the existing studies [ 16 , 30 ] and iden-

tified the expensive operation of solving the MIS problem. They defined a new

support measure.

ϕ ( V ( p ))

Definition 13.8 (Minimum Image Based Support) Given a pattern p

( V ( p ),

E ( p )) , the minimum image based support of p in g is defined as

σ ∧ ( p , g )

min

V ( p ) |{

ϕ i ( v ): ϕ i is an occurrence of p in g

∈

It is based on the number of unique nodes in the graph g to which a node of

the pattern p is mapped. This measure avoids the MIS computation. Therefore it is

computationally less expensive and often closer to intuition than measures proposed

in [ 16 , 30 ].

By taking the node in p which is mapped to the least number of unique nodes in g ,

the anti-monotonicity of σ ∧ can be guaranteed. For the definition of support, several

computational benefits could be identified: (1) instead of O ( n 2 ) potential overlaps,

where n is the possibly exponential number of occurrences, the method only needs

to maintain a set of vertices for every node in the pattern, which can be done in O ( n );

(2) the method does not need to solve an NP complete MIS problem; and (3) it is

not necessary to compute all occurrences: it is sufficient to determine for every pair

of v

V ( p ) and v ∈

v .

∈

V ( g ) if there is one occurrence in which ϕ ( v )

Frequent Pattern Mining

Search WWH ::

Custom Search

Home