Database Reference
In-Depth Information
have i j
= i j + 1
1, 1
j<m , then We call Z a substring of S . We use
| S |
to denote
the length of a string S .
Definition 4.2 (Frequent Subsequence) For a string dataset D
={
S 1 , ... , S n }
S for all S
and a string S, let D S be the subset of D such that S
D S . Then S is
frequent in D if | D S |
σ , where | D S |
| D |
is called the support of S in D, written s ( S ) ,
and σ is the minimum support threshold , 0
| D |
σ
1 .
Graphs The setting of graphs represent the most complicated case for mining long
patterns, which is further divided into two settings: graph transaction setting and
single graph setting. As a convention, the vertex set of a graph G is denoted by V ( G )
and the edge set by E ( G ). The size of a graph P is defined by the number of edges
of P , written as
|
P
|
. In frequent graph mining setting, a graph G
=
( V ( G ), E ( G ))
is associated with a labeling function l G : V ( G )
. Graph
isomorphism in our problem setting requires matching of the labels for each mapped
pair of vertices. Most methods can also be applied to graphs with edge labels.
Σ , Σ
={
ς 1 , ς 2 , ... , ς k }
Two labeled graphs G and G are
Definition 4.3 (Labeled Graph Isomorphism)
V ( G ) , such that
isomorphic if there exists a bijection f : V ( G )
u
V ( G ) ,
E ( G ) .
We use G = L G to denote that two labeled graphs G and G are isomorphic.
Given two graphs P and G , a subgraph G of G is called an embedding of P in G
if P = L G . For a single graph G and a pattern P , we use e P to denote a particular
embedding of a pattern P , and the set of all embeddings of P is denoted as E [ P ]. We
denote as P sup the support set for a pattern P . In single graph setting, P sup = E [ P ]
while in graph transaction setting P sup is the set of graphs of the database each
containing at least one embedding of P .
l G ( u )
=
l G ( f ( u )) and ( u , v )
E ( G ) if and only if ( f ( u ), f ( v ))
Definition 4.4 (Frequent Graph) Given D as a graph dataset D
={
G 1 , ... , G n }
or a single graph, and a graph G , G is frequent in D if | P sup |
| D |
σ , where σ is the
minimum support threshold, 0
1.
In an effort to reduce the size of the frequent pattern mining result, concepts of
closed patterns and maximal patterns have been proposed which apply to all the
different data formats.
σ
Definition 4.5 (Closed Pattern) A pattern p is a closed pattern in a data set D if
p is frequent in D and there exists no proper super-pattern p such that p
p and
p has the same support as p in D.
Definition 4.6 (Maximal Pattern) A pattern p is a maximal pattern in a data set
D if p is frequent in D and there exists no super-pattern p such that p
p and p
is frequent in D.
It is worth noting that long patterns in a data set are usually the maximal patterns.
As such, algorithms mining for maximal patterns would naturally return the long
patterns. We therefore give priority to these algorithms in this chapter.
Search WWH ::




Custom Search