Database Reference
In-Depth Information
Kuramochi and Karypis [
30
] proposed two efficient algorithms that can find fre-
quent subgraphs within a large sparse graph. The first algorithm, called HSIGRAM,
follows a horizontal approach and finds frequent subgraphs in a breadth-first fashion.
The second algorithm, called VSIGRAM, follows a vertical approach and finds the
frequent subgraphs in a depth-first fashion. For the support measure defined in [
30
],
all possible occurrences
ϕ
of a pattern
p
in a graph
g
are calculated. An
overlap-
graph
is constructed where each occurrence
ϕ
corresponds to a node and there is an
edge between the nodes of
ϕ
and
ϕ
if they overlap. This is called
simple overlap
as
defined below.
Definition 13.6 (Simple Overlap)
Given a pattern p
(
V
(
p
),
E
(
p
))
, a simple
overlap of occurrences ϕ and ϕ
of pattern p exists if ϕ
(
E
(
p
))
=
ϕ
(
E
(
p
))
.
The support of
p
is defined as the size of the maximum independent set (MIS) of
the overlap-graph. A later study [
16
] proved that the MIS-support is anti-monotone.
Fiedler and Borgelt [
16
] suggested a definition that relies on the non-existence
of equivalent ancestor embeddings in order to guarantee that the resulting support
is anti-monotone. The support is called
harmful overlap support
. The basic idea of
this measure is that some of the simple overlaps (in [
30
]) can be disregarded without
harming the anti-monotonicity of the support measure. As in [
30
], an overlap graph
is constructed and the support is defined as the size of the MIS. The major difference
is the definition of the overlap.
Definition 13.7 (Harmful Overlap)
Given a pattern p
∩
= ∅
=
(
V
(
p
),
E
(
p
))
, a harmful
overlap of occurrences ϕ and ϕ
V
(
p
):
ϕ
(
v
),
ϕ
(
v
)
of pattern p exists if
∃
v
∈
∈
∩
ϕ
(
V
(
p
))
.
Bringmann and Nijssen [
8
] examined the existing studies [
16
,
30
] and iden-
tified the expensive operation of solving the MIS problem. They defined a new
support measure.
ϕ
(
V
(
p
))
Definition 13.8 (Minimum Image Based Support)
Given a pattern p
=
(
V
(
p
),
E
(
p
))
, the minimum image based support of p in g is defined as
σ
∧
(
p
,
g
)
=
min
V
(
p
)
|{
ϕ
i
(
v
):
ϕ
i
is an occurrence of p in g
}|
.
v
∈
It is based on the number of unique nodes in the graph
g
to which a node of
the pattern
p
is mapped. This measure avoids the MIS computation. Therefore it is
computationally less expensive and often closer to intuition than measures proposed
in [
16
,
30
].
By taking the node in
p
which is mapped to the least number of unique nodes in
g
,
the anti-monotonicity of
σ
∧
can be guaranteed. For the definition of support, several
computational benefits could be identified: (1) instead of
O
(
n
2
) potential overlaps,
where
n
is the possibly exponential number of occurrences, the method only needs
to maintain a set of vertices for every node in the pattern, which can be done in
O
(
n
);
(2) the method does not need to solve an NP complete MIS problem; and (3) it is
not necessary to compute all occurrences: it is sufficient to determine for every pair
of
v
V
(
p
) and
v
∈
v
.
∈
V
(
g
) if there is one occurrence in which
ϕ
(
v
)
=