Database Reference
In-Depth Information
have
i
j
=
i
j
+
1
−
1, 1
≤
j<m
, then We call
Z
a
substring
of
S
. We use
|
S
|
to denote
the length of a string
S
.
Definition 4.2 (Frequent Subsequence)
For a string dataset D
={
S
1
,
...
,
S
n
}
S
for all S
∈
and a string S, let D
S
be the subset of D such that S
⊆
D
S
. Then S is
frequent in D if
|
D
S
|
σ , where
|
D
S
|
|
D
|
is called the support of S in D, written s
(
S
)
,
and σ is the
minimum support threshold
,
0
|
D
|
≥
≤
σ
≤
1
.
Graphs
The setting of graphs represent the most complicated case for mining long
patterns, which is further divided into two settings: graph transaction setting and
single graph setting. As a convention, the
vertex set
of a graph
G
is denoted by
V
(
G
)
and the
edge set
by
E
(
G
). The size of a graph
P
is defined by the number of edges
of
P
, written as
|
P
|
. In frequent graph mining setting, a graph
G
=
(
V
(
G
),
E
(
G
))
is associated with a labeling function
l
G
:
V
(
G
)
. Graph
isomorphism in our problem setting requires matching of the labels for each mapped
pair of vertices. Most methods can also be applied to graphs with edge labels.
→
Σ
,
Σ
={
ς
1
,
ς
2
,
...
,
ς
k
}
Two
labeled graphs G and G
are
Definition 4.3 (Labeled Graph Isomorphism)
V
(
G
)
, such that
isomorphic if there exists a bijection f
:
V
(
G
)
→
∀
u
∈
V
(
G
)
,
E
(
G
)
.
We use
G
=
L
G
to denote that two labeled graphs
G
and
G
are isomorphic.
Given two graphs
P
and
G
, a subgraph
G
of
G
is called an
embedding
of
P
in
G
if
P
=
L
G
. For a single graph
G
and a pattern
P
, we use
e
P
to denote a particular
embedding of a pattern
P
, and the set of all embeddings of
P
is denoted as
E
[
P
]. We
denote as
P
sup
the support set for a pattern
P
. In single graph setting,
P
sup
=
E
[
P
]
while in graph transaction setting
P
sup
is the set of graphs of the database each
containing at least one embedding of
P
.
l
G
(
u
)
=
l
G
(
f
(
u
))
and
(
u
,
v
)
∈
E
(
G
)
if and only if
(
f
(
u
),
f
(
v
))
∈
Definition 4.4 (Frequent Graph)
Given
D
as a graph dataset
D
={
G
1
,
...
,
G
n
}
or a single graph, and a graph
G
,
G
is
frequent
in
D
if
|
P
sup
|
|
D
|
≥
σ
, where
σ
is the
minimum support threshold, 0
1.
In an effort to reduce the size of the frequent pattern mining result, concepts of
closed
patterns and
maximal
patterns have been proposed which apply to all the
different data formats.
≤
σ
≤
Definition 4.5 (Closed Pattern)
A pattern p is a closed pattern in a data set D if
p is frequent in D and there exists no proper super-pattern p
such that p
p
and
⊂
p
has the same support as p in D.
Definition 4.6 (Maximal Pattern)
A pattern p is a maximal pattern in a data set
D if p is frequent in D and there exists no super-pattern p
such that p
p
and p
⊂
is frequent in D.
It is worth noting that long patterns in a data set are usually the maximal patterns.
As such, algorithms mining for maximal patterns would naturally return the long
patterns. We therefore give priority to these algorithms in this chapter.