Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

the number of false posi tiv es or negatives, two sets are used to approximate

S α by

samling: (1) the first set S which tries to maximize the recall of discovering highly

reliable subgraphs; and (2) the second set S which tries to maximize the precision

of discovery.

Under this sampling framework, a subproblem is to determine the vertex subset

V s in a graph G for discovering the induced subgraph. This is formulated as the

frequent cohesive set mining problem. Specifically, given a set of graphs D

{

V and a

minimal support threshold θ , a frequent cohesive set is any subset of vertices V s ⊆

G 1 , G 2 , ... , G N }

with vertices V ( G 1 )

V ( G 2 )

...

V ( G N )

taht is a cohesive set in at least θ

N graphs. A two-stage mining algorithm is proposed

for mining the frequent cohesive sets. In the first stage, a top-down peeling process is

employed to iteratively refine patterns to make them converge into maximal frequent

cohesive sets. In the second stage, a DFS mining process is employed which utilizes

the maximal frequent cohesive sets as the boundary to prune the search space and

discover all the non-maximal frequent cohesive sets.

Zou et al. [ 53 ] studied the problem of mining frequent subgraphs on uncertain

graph data under probabilistic semantics. An uncertain graph database is denoted as

where each G i is an uncertain graph. An implicated graph

database is denoted as D

G 1 , G 2 , ... , G n }

where each G i is an implicated

certain graph of an uncertain graph in D . For a certain subgraph S , its support

in D is sup ( S ; D )

G 1 , G 2 , ... , G m }

|{ G ∈ D | S G }|

| D |

. Then the probability that the support of S is

no less than 0

≤

ϕ 1 across all implicated graph databases of D is Pr ( S ; D , ϕ )

D ∈ Imp ( D ), sup ( S ; D ) ≥ ϕ Pr ( D

D ) is the probability of D

implicating D , and Imp ( D ) is the set of all implicated graph databases of D .A

subgraph S is called ( ϕ , τ )-probabilistic frequent if the ϕ -frequent probability of S is

no less than a user-specified confidence threshold 0

⇒

D ), where Pr ( D

⇒

1. The problem of mining

frequent subgraphs in an uncertain graph database under probabilistic semantics is,

given an uncertain graph database D , a support threshold 0

≤ τ ≤

≤ ϕ ≤

1, and a confidence

threshold 0

1, find all ( ϕ , τ )-probabilistic frequent subgraphs in D .

It is proven that it is #P-hard to compute the ϕ -frequent probability of a subgraph

S in an uncertain graph database and to count the number of frequent subgraphs in

an uncertain graph database. Then an approximate mining algorithm is proposed to

find a broader set of subgraphs including all frequent subgraphs and a fraction of

infrequent subgraphs but with ϕ -frequent probability at least τ

≤ τ ≤

is an error tolerance. The main steps of the algorithm are as follows. First, organize

all subgraphs in D into a search tree, where nodes represent subgraphs, and each

node is subgraph isomorphic to all its children if it has any, and has one less edge

than all of them. Second, examine the subgraphs in the search tree in depth-first

order. For each examined subgraph S , determine in polynomial time whether S has

ϕ -frequent probability at least τ

−

ε , where 0

≤

ε and probably at least τ . If the answer is “yes”,

then output S and proceed to examine the descendants of S in depth-first order.

Otherwise, all descendants of S are infrequent and can be pruned due to the apriori

property. A dynamic programming algorithm is proposed for approximating the ϕ -

frequent probability, Pr ( S ; D , ϕ ), of a subgraph S by an interval [ p l , p u ] such that

−

p u −

p l |≤

∈

ε and Pr ( S ; D , ϕ )

[ p l , p u ].

Frequent Pattern Mining

Search WWH ::

Custom Search

Home