TheQuery Evaluation Problem - Probabilistic Databases

Database Reference

In-Depth Information

k -partitioned graph, i.e., a graph where the vertices are partitioned into k disjoint sets, and every edge

goes from some node in partition i to some node in partition i + 1, for some i = 1 ,k − 1. The ques-

tion is, how large should we choose k . Clearly, if we choose k

2, i.e., we consider bipartite graphs,

then Q is always false, hence it is easy to compute P (Q) (it is 0). If we consider 3-partite graphs, then

there are two kinds of edges: from partition 1 to 2, denoted U 1 (x, y) and from partition 2 to 3, denoted

U 2 (x, y) . These two sets are disjoint, hence Q is equivalent to

∃ x. ∃ y. ∃ z.U 1 (x, y), U 2 (y, z) , and

this query has polynomial time data complexity (it can be computed using the rules described in the

next chapter, as P (Q) =

− a ( 1

− P( ∃ x.U 1 (x, a)) · P( ∃ z.U 2 (a, z))) , and P( ∃ x.U 1 (x, a)) =

1 − ( 1 − b ( 1 − P(U 1 (b, a)))) , and similarly for P( ∃ z.U 2 (a, z)) .). So Q is also easy on 3-partite

graphs.

Consider therefore 4-partite graphs. Now, there are three kinds of edges, denoted U 1 ,U 2 ,U 3 ,

and the query is equivalent to the following (we omit existential quantifiers):

U 1 (x, y), U 2 (y, z)

U 2 (y, z), U 3 (z, v)

∨

In other words, a path of length 2 can either consist of two edges in U 1 and U 2 , or of two edges in

U 2 and U 3 . Next, we make a further restriction on the 4-partite graph: we restrict the first partition

to have a single node s (call it “source node”), and the fourth partition to have a single node t (call

it “target node”). We prove that Q is hard even if the input is restricted to 4-partite graphs of this

kind. Indeed, then Q becomes:

Q = U 1 (s, y), U 2 (y, z) ∨ U 2 (y, z), U 3 (z, t)

This is precisely H 1 = R(y),S(y,z) ∨ S(y,z),T (z) , up to the renaming of relations: R(y) ≡

U 1 (s, y) ; S(y,z)

U 2 (y, z) ; and T(z)

U 3 (z, t) .

≡

3.3

BIBLIOGRAPHIC AND HISTORICAL NOTES

Valiant [ 1979 ] introduced the complexity class #P and showed, among other things, that model

counting for propositional formulas is hard for #P, even when restricted to positive 2CNF (P2CNF).

Provan and Ball [ 1983 ] showed that model counting for Partitioned Positive 2CNF (PP2CNF) is

also hard for #P.

There exists different kinds of reductions for proving #P-hardness, which result in slightly

different types of #P-hardness results: our proof of Theorem 3.2 uses a 1-Turing reduction for the

hardness proof of H 0 and a Turing reduction for the hardness proof of H 1 . Durand et al. [ 2005 ]

discusses various notions of reductions.

The data complexity of queries on probabilistic databases was first considered by Grädel et al.

[ 1998 ], which showed, in essence, that the Boolean query R(x),S(x,y),R(y) is hard for #P, by

reduction from P2DNF. Dalvi and Suciu [ 2004 ] consider conjunctive queries without self-joins,

and prove a dichotomy into polynomial-time and #P-hard queries. In particular, they prove that

H 0 = R(x),S(x,y),T(y) is #P-hard by reduction from PP2DNF (same proof as in this chapter).

Probabilistic Databases

Search WWH ::

Custom Search

Home