Database Reference
In-Depth Information
the number of false posi tiv es or negatives, two sets are used to approximate
S α by
samling: (1) the first set S which tries to maximize the recall of discovering highly
reliable subgraphs; and (2) the second set S which tries to maximize the precision
of discovery.
Under this sampling framework, a subproblem is to determine the vertex subset
V s in a graph G for discovering the induced subgraph. This is formulated as the
frequent cohesive set mining problem. Specifically, given a set of graphs D
=
{
V and a
minimal support threshold θ , a frequent cohesive set is any subset of vertices V s
G 1 , G 2 , ... , G N }
with vertices V ( G 1 )
=
V ( G 2 )
=
...
=
V ( G N )
=
V
taht is a cohesive set in at least θ
N graphs. A two-stage mining algorithm is proposed
for mining the frequent cohesive sets. In the first stage, a top-down peeling process is
employed to iteratively refine patterns to make them converge into maximal frequent
cohesive sets. In the second stage, a DFS mining process is employed which utilizes
the maximal frequent cohesive sets as the boundary to prune the search space and
discover all the non-maximal frequent cohesive sets.
Zou et al. [ 53 ] studied the problem of mining frequent subgraphs on uncertain
graph data under probabilistic semantics. An uncertain graph database is denoted as
D
·
where each G i is an uncertain graph. An implicated graph
database is denoted as D
={
G 1 , G 2 , ... , G n }
where each G i is an implicated
certain graph of an uncertain graph in D . For a certain subgraph S , its support
in D is sup ( S ; D )
={
G 1 , G 2 , ... , G m }
|{ G D | S G }|
| D |
=
. Then the probability that the support of S is
no less than 0
ϕ 1 across all implicated graph databases of D is Pr ( S ; D , ϕ )
=
D Imp ( D ), sup ( S ; D ) ϕ Pr ( D
D ) is the probability of D
implicating D , and Imp ( D ) is the set of all implicated graph databases of D .A
subgraph S is called ( ϕ , τ )-probabilistic frequent if the ϕ -frequent probability of S is
no less than a user-specified confidence threshold 0
D ), where Pr ( D
1. The problem of mining
frequent subgraphs in an uncertain graph database under probabilistic semantics is,
given an uncertain graph database D , a support threshold 0
τ
ϕ
1, and a confidence
threshold 0
1, find all ( ϕ , τ )-probabilistic frequent subgraphs in D .
It is proven that it is #P-hard to compute the ϕ -frequent probability of a subgraph
S in an uncertain graph database and to count the number of frequent subgraphs in
an uncertain graph database. Then an approximate mining algorithm is proposed to
find a broader set of subgraphs including all frequent subgraphs and a fraction of
infrequent subgraphs but with ϕ -frequent probability at least τ
τ
τ
is an error tolerance. The main steps of the algorithm are as follows. First, organize
all subgraphs in D into a search tree, where nodes represent subgraphs, and each
node is subgraph isomorphic to all its children if it has any, and has one less edge
than all of them. Second, examine the subgraphs in the search tree in depth-first
order. For each examined subgraph S , determine in polynomial time whether S has
ϕ -frequent probability at least τ
ε , where 0
ε
ε and probably at least τ . If the answer is “yes”,
then output S and proceed to examine the descendants of S in depth-first order.
Otherwise, all descendants of S are infrequent and can be pruned due to the apriori
property. A dynamic programming algorithm is proposed for approximating the ϕ -
frequent probability, Pr ( S ; D , ϕ ), of a subgraph S by an interval [ p l , p u ] such that
|
p u
p l |≤
ε and Pr ( S ; D , ϕ )
[ p l , p u ].
Search WWH ::




Custom Search