Database Reference
In-Depth Information
3.2
THE COMPLEXITY OF P (Q)
We now turn to the complexity of the query evaluation problem. Throughout this section we assume
that the query Q is a Boolean query, thus the goal is to compute P (Q) . This is without loss of
generality, since the probability of any possible answer
a to a non-Boolean query Q( x) reduces to
the probability of the Boolean query Q [ a/x ]
, more precisely, P(a Q) = P(Q [ a/x ] ) . We also
restrict the input probabilistic database to a tuple-independent database: if a query Q is hard on
tuple-independent databases, then it remains hard over more expressive probabilistic databases, as
long as these allow tuple-independent databases as a special case.
We are interested in the data complexity of the query evaluation problem: for a fixed query Q ,
what is the complexity as a function of the database D ? The answer will depend on the query: for
some queries, the complexity is in polynomial time, for other queries it is not. A query Q is called
tractable if its data complexity is in polynomial time; otherwise, the query Q is called intractable .
Recall that, over deterministic databases, for every query in the relational calculus the data complexity
is in polynomial time [ Va rd i , 1982 ], hence it is tractable according to our terminology.
In this section we prove that, for each of the queries below, the evaluation problem is hard for
#P:
H 0 = R(x),S(x,y),T(y)
H 1 = R(x 0 ), S(x 0 ,y 0 ) S(x 1 ,y 1 ), T (y 1 )
H 2 = R(x 0 ), S 1 (x 0 ,y 0 ) S 1 (x 1 ,y 1 ), S 2 (x 1 ,y 1 ) S 2 (x 2 ,y 2 ), T (y 2 )
H 3 = R(x 0 ), S 1 (x 0 ,y 0 ) S 1 (x 1 ,y 1 ), S 2 (x 1 ,y 1 ) S 2 (x 2 ,y 2 ), S 3 (x 2 ,y 2 ) S 3 (x 3 ,y 3 ), T (y 3 )
...
Each query is a Boolean query, but we have dropped the quantifiers for conciseness; that is, H 0 =
x. y.R(x),S(x,y),T (y) , etc. For each query H k , we are interested in evaluating P(H k ) on a tuple-
independent probabilistic database D
= ( R, S 1 ,...,S k ,T ,P) , and measure the complexity as a
function of the size of D (that is, the query is fixed).
For ever y k
0, the data complexity of the query H k is hard for # P .
Theorem 3.2
Proof. We give two separate proofs, one for H 0 and one for H 1 ; the proof for H k , for k 2 is a
non-trivial extension of that for H 1 and is omitted; it can be found in [ Dalvi and Suciu , 2010 ].
The proof for H 0 is by reduction from #PP2DNF ( Theorem 3.1 ). Consider any formula
=
(i,j)
X i Y j
(3.1)
E
and construct the following probabilistic database instance D
= ( R, S, T ,P) , where: R =
{ X 1 ,X 2 ,... }
, T
={ Y 1 ,Y 2 ,... }
, S ={ (X i ,Y j ) | (i, j) E }
, and the probability function is de-
fined as follows: P(R(X i )) = P(T(Y j )) =
1 / 2, P(S(X i ,Y j )) =
1. Every possible world is of the
Search WWH ::




Custom Search