Database Reference
In-Depth Information
have:
P (Q) =
P (R(x), S(x, y), T ( b , y), U( a ,b))
b
=
P (R(x), S(x, y), T ( b ,y))
·
P(U( a ,b))
b
=
P (R(x), S(x, c), T ( b ,c))
·
P(U( a ,b))
b
c
=
P(R(x),S(x,c)) · P(T(b,c)) · P(U(a,b))
c
b
=
( 1
( 1 P (R(d), S(d, c)))) · P(T(b,c)) · P(U(a,b))
c
b
d
Now consider the non-Boolean variant of the query: Q(w) = R(x), S(x, y), T ( z , y), U(w,z) .It
can be computed by the following safe plan:
= w [ i y (R(x)
i x S(x,y))
i y T( z ,y)
z U(w,z)
P
4.3.2 DETERMINISTIC TABLES
In many applications of probabilistic databases, several of the tables are deterministic. Typically, one
starts with a standard database and extends it with new probabilistic tables that contain the uncertain
data. Queries are therefore expressed over a mixed schema where some tables are probabilistic and
others are deterministic. So far, we have assumed that all tables occurring in a query are probabilistic.
Deterministic tables could be treated simply as probabilistic tables where all tuples have probability
1 . 0, but this approach is too pessimistic: by recognizing that some tables are deterministic, we
can often transform a hard query into a tractable query. For example, consider the Boolean query
Q = R(x),S(x,y),T(y) : if all three tables are tuple-independent tables, then this query is exactly
H 0 , which we have shown in Subsection 4.1.3 to be intractable. However, assume now that T is
a deterministic table. Then the query can be computed by the following safe plan: i
i x
(R(x)
(S(x, y)
y T(y)) . The last join is between a probabilistic table and a deterministic table. Thus, the
query becomes tractable, if T is known to be deterministic.
Consider a probabilistic database, where each relation name is labeled as being either de-
terministic or probabilistic. To take advantage of the deterministic tables, we make the following
change to the independent-project rule: we require the variable x to occur in all key positions of
all probabilistic tables only. For example, in R(x),S(x,y),T(y) ,if T is deterministic, then we call x
a root variable since it occurs in all probabilistic tables (and it is also a separator variable since the
query is without self-joins). The modified rule is still sound. It is not known whether the rules R 6 are
complete over probabilistic databases with mixed schemas, except for conjunctive queries without
self-joins: here is known that the rules are complete [ Dalvi and Suciu , 2007c ].
 
Search WWH ::




Custom Search