Database Reference
In-Depth Information
Kuramochi and Karypis [ 30 ] proposed two efficient algorithms that can find fre-
quent subgraphs within a large sparse graph. The first algorithm, called HSIGRAM,
follows a horizontal approach and finds frequent subgraphs in a breadth-first fashion.
The second algorithm, called VSIGRAM, follows a vertical approach and finds the
frequent subgraphs in a depth-first fashion. For the support measure defined in [ 30 ],
all possible occurrences ϕ of a pattern p in a graph g are calculated. An overlap-
graph is constructed where each occurrence ϕ corresponds to a node and there is an
edge between the nodes of ϕ and ϕ if they overlap. This is called simple overlap as
defined below.
Definition 13.6 (Simple Overlap) Given a pattern p
( V ( p ), E ( p )) , a simple
overlap of occurrences ϕ and ϕ of pattern p exists if ϕ ( E ( p ))
=
ϕ ( E ( p ))
.
The support of p is defined as the size of the maximum independent set (MIS) of
the overlap-graph. A later study [ 16 ] proved that the MIS-support is anti-monotone.
Fiedler and Borgelt [ 16 ] suggested a definition that relies on the non-existence
of equivalent ancestor embeddings in order to guarantee that the resulting support
is anti-monotone. The support is called harmful overlap support . The basic idea of
this measure is that some of the simple overlaps (in [ 30 ]) can be disregarded without
harming the anti-monotonicity of the support measure. As in [ 30 ], an overlap graph
is constructed and the support is defined as the size of the MIS. The major difference
is the definition of the overlap.
Definition 13.7 (Harmful Overlap) Given a pattern p
= ∅
=
( V ( p ), E ( p )) , a harmful
overlap of occurrences ϕ and ϕ
V ( p ): ϕ ( v ), ϕ ( v )
of pattern p exists if
v
ϕ ( V ( p )) .
Bringmann and Nijssen [ 8 ] examined the existing studies [ 16 , 30 ] and iden-
tified the expensive operation of solving the MIS problem. They defined a new
support measure.
ϕ ( V ( p ))
Definition 13.8 (Minimum Image Based Support) Given a pattern p
=
( V ( p ),
E ( p )) , the minimum image based support of p in g is defined as
σ ( p , g )
=
min
V ( p ) |{
ϕ i ( v ): ϕ i is an occurrence of p in g
}|
.
v
It is based on the number of unique nodes in the graph g to which a node of
the pattern p is mapped. This measure avoids the MIS computation. Therefore it is
computationally less expensive and often closer to intuition than measures proposed
in [ 16 , 30 ].
By taking the node in p which is mapped to the least number of unique nodes in g ,
the anti-monotonicity of σ can be guaranteed. For the definition of support, several
computational benefits could be identified: (1) instead of O ( n 2 ) potential overlaps,
where n is the possibly exponential number of occurrences, the method only needs
to maintain a set of vertices for every node in the pattern, which can be done in O ( n );
(2) the method does not need to solve an NP complete MIS problem; and (3) it is
not necessary to compute all occurrences: it is sufficient to determine for every pair
of v
V ( p ) and v
v .
V ( g ) if there is one occurrence in which ϕ ( v )
=
Search WWH ::




Custom Search