Database Reference
In-Depth Information
Note that
|
C(θ)
|
is a number between 0 and m . Then, we claim that:
P( ¬ H 1 | E θ ) = z | C(θ) |
(3.4)
Indeed, consider a world W that satisfies E θ . Since H 1 =
R(x 0 ), S(x 0 ,y 0 )
S(x 1 ,y 1 ), T (y 1 ) ,we
have
¬ (W
|=
H 1 ) iff both queries R(x),S(x,y) and S(x,y),T (y) are false on W . Consider a tuple
Y j , then either R W (X i ) is true, or T W (Y j ) is true (because
(X i ,Y j )
S .If θ satisfies the clause X i
¬ S W (X i ,Y j ) to ensure that both R W (X i ), S W (X i ,Y j ) and
S W (X i ,Y j ), T W (Y j ) are false; the probability of the event
|=
W
E θ ), and therefore we must have
S W (X i ,Y j ) is z .If θ does not satisfy
the clause X i Y j , then both queries R(X i ), S(X i ,Y j ) and S(X i ,Y j ), T (Y j ) are false regardless of
whether S W
¬
H 1 ) iff S W
contains the tuple (X i ,Y j ) or not. In other words,
¬ (W
|=
does not contain
any tuple (X i ,Y j ) for which (i, j) C(θ) ; this proves Eq. (3.4) .
Finally, we compute P( ¬ H 1 ) . For any number c , 0 c m , let # c =
the number of valuations
θ that satisfy exactly c clauses, i.e., # c =|{ θ | c =| C(θ) |}|
. Then, Eq. (3.3) and Eq. (3.4) become:
2 n
c = 0 ,m
1
# c · z c
P( ¬ H 1 ) =
This is a polynomial in z of degree m , with coefficients #0 , #1 ,..., # m . In other words, and oracle
for P(
# m , because in # represents the
number of valuations that satisfy all m clauses. Therefore, we can compute # using an oracle for
P( ¬ H 1 ) as follows. Choose any m + 1 distinct values for z ( 0 , 1 ) , and construct m + 1 different
database instances R, S, T (they are isomorphic, and differ only in the probabilities in S , which
are set to 1 z ). Then we call the oracle, and obtain the value of the polynomial at that point z .
From these m
¬
H 1 ) computes the polynomial above. Note that #
=
1 values we will derive all the coefficients, e.g., by using Lagrange's polynomial
interpolation formula. The leading coefficient, # m , is precisely # .
+
As we will show in Chapter 4 , not all queries are hard; in fact, many queries can be evalu-
ated quite efficiently on tuple-independent, or on BID databases. Moreover, the list of queries H k ,
k
0 , 1 ,... is not even complete: there are many other queries that are also hard. For Unions of
Conjunctive Queries (UCQ), we know exactly which queries are hard, and this class includes all the
queries H k and others too; we will discuss this in Chapter 4 . For the full Relational Calculus (RC),
the class of hard queries is not known exactly (of course, it includes all UCQ queries that are hard).
By using the queries H k as primitives, it is quite easy to prove that some other queries are
hard. We illustrate this on the example below which is both interesting in its own right, and also
illustrates one key proof technique used in the general hardness proof [ Dalvi and Suciu , 2010 ].
=
Example 3.3
Consider the Boolean query
Q =∃ x. y. z.U(x,y),U(y,z)
We prove that it is hard for # P . The query checks for the presence of a path of length 2 in the graph
defined by the binary edge relation U . We will prove that it is hard even if we restrict the graph to a
Search WWH ::




Custom Search