Mining Long Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

have i j

= i j + 1 −

1, 1

≤ j<m , then We call Z a substring of S . We use

| S |

to denote

the length of a string S .

Definition 4.2 (Frequent Subsequence) For a string dataset D

S 1 , ... , S n }

S for all S ∈

and a string S, let D S be the subset of D such that S

⊆

D S . Then S is

frequent in D if | D S |

σ , where | D S |

| D |

is called the support of S in D, written s ( S ) ,

and σ is the minimum support threshold , 0

| D | ≥

≤

1 .

Graphs The setting of graphs represent the most complicated case for mining long

patterns, which is further divided into two settings: graph transaction setting and

single graph setting. As a convention, the vertex set of a graph G is denoted by V ( G )

and the edge set by E ( G ). The size of a graph P is defined by the number of edges

of P , written as

. In frequent graph mining setting, a graph G

( V ( G ), E ( G ))

is associated with a labeling function l G : V ( G )

. Graph

isomorphism in our problem setting requires matching of the labels for each mapped

pair of vertices. Most methods can also be applied to graphs with edge labels.

→

Σ , Σ

ς 1 , ς 2 , ... , ς k }

Two labeled graphs G and G are

Definition 4.3 (Labeled Graph Isomorphism)

V ( G ) , such that

isomorphic if there exists a bijection f : V ( G )

→

∀

∈

V ( G ) ,

E ( G ) .

We use G = L G to denote that two labeled graphs G and G are isomorphic.

Given two graphs P and G , a subgraph G of G is called an embedding of P in G

if P = L G . For a single graph G and a pattern P , we use e P to denote a particular

embedding of a pattern P , and the set of all embeddings of P is denoted as E [ P ]. We

denote as P sup the support set for a pattern P . In single graph setting, P sup = E [ P ]

while in graph transaction setting P sup is the set of graphs of the database each

containing at least one embedding of P .

l G ( u )

l G ( f ( u )) and ( u , v )

∈

E ( G ) if and only if ( f ( u ), f ( v ))

∈

Definition 4.4 (Frequent Graph) Given D as a graph dataset D

G 1 , ... , G n }

or a single graph, and a graph G , G is frequent in D if | P sup |

| D |

≥ σ , where σ is the

minimum support threshold, 0

In an effort to reduce the size of the frequent pattern mining result, concepts of

closed patterns and maximal patterns have been proposed which apply to all the

different data formats.

≤ σ

≤

Definition 4.5 (Closed Pattern) A pattern p is a closed pattern in a data set D if

p is frequent in D and there exists no proper super-pattern p such that p

p and

⊂

p has the same support as p in D.

Definition 4.6 (Maximal Pattern) A pattern p is a maximal pattern in a data set

D if p is frequent in D and there exists no super-pattern p such that p

p and p

⊂

is frequent in D.

It is worth noting that long patterns in a data set are usually the maximal patterns.

As such, algorithms mining for maximal patterns would naturally return the long

patterns. We therefore give priority to these algorithms in this chapter.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home