Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

have:

P (Q) =

P (R(x), S(x, y), T ( b , y), U( a ,b))

b

=

P (R(x), S(x, y), T ( b ,y))

·

P(U( a ,b))

b

=

P (R(x), S(x, c), T ( b ,c))

·

P(U( a ,b))

b

c

=

P(R(x),S(x,c)) · P(T(b,c)) · P(U(a,b))

c

b

=

( 1 −

( 1 − P (R(d), S(d, c)))) · P(T(b,c)) · P(U(a,b))

c

b

d

Now consider the non-Boolean variant of the query: Q(w) = R(x), S(x, y), T ( z , y), U(w,z) .It

can be computed by the following safe plan:

= w [ i y (R(x)

i x S(x,y))

i y T( z ,y)

z U(w,z)

P

4.3.2 DETERMINISTIC TABLES

In many applications of probabilistic databases, several of the tables are deterministic. Typically, one

starts with a standard database and extends it with new probabilistic tables that contain the uncertain

data. Queries are therefore expressed over a mixed schema where some tables are probabilistic and

others are deterministic. So far, we have assumed that all tables occurring in a query are probabilistic.

Deterministic tables could be treated simply as probabilistic tables where all tuples have probability

1 . 0, but this approach is too pessimistic: by recognizing that some tables are deterministic, we

can often transform a hard query into a tractable query. For example, consider the Boolean query

Q = R(x),S(x,y),T(y) : if all three tables are tuple-independent tables, then this query is exactly

H 0 , which we have shown in Subsection 4.1.3 to be intractable. However, assume now that T is

a deterministic table. Then the query can be computed by the following safe plan: i

∅

i x

(R(x)

(S(x, y)

y T(y)) . The last join is between a probabilistic table and a deterministic table. Thus, the

query becomes tractable, if T is known to be deterministic.

Consider a probabilistic database, where each relation name is labeled as being either de-

terministic or probabilistic. To take advantage of the deterministic tables, we make the following

change to the independent-project rule: we require the variable x to occur in all key positions of

all probabilistic tables only. For example, in R(x),S(x,y),T(y) ,if T is deterministic, then we call x

a root variable since it occurs in all probabilistic tables (and it is also a separator variable since the

query is without self-joins). The modified rule is still sound. It is not known whether the rules R 6 are

complete over probabilistic databases with mixed schemas, except for conjunctive queries without

self-joins: here is known that the rules are complete [ Dalvi and Suciu , 2007c ].

Probabilistic Databases

Search WWH ::

Custom Search

Home