Database Reference
In-Depth Information
have:
P (Q)
=
P (R(x), S(x, y), T (
b
, y), U(
a
,b))
b
=
P (R(x), S(x, y), T (
b
,y))
·
P(U(
a
,b))
b
=
P (R(x), S(x, c), T (
b
,c))
·
P(U(
a
,b))
b
c
=
P(R(x),S(x,c))
·
P(T(b,c))
·
P(U(a,b))
c
b
=
(
1
−
(
1
−
P (R(d), S(d, c))))
·
P(T(b,c))
·
P(U(a,b))
c
b
d
Now consider the non-Boolean variant of the query:
Q(w)
=
R(x), S(x, y), T (
z
, y), U(w,z)
.It
can be computed by the following safe plan:
=
w
[
i
y
(R(x)
i
x
S(x,y))
i
y
T(
z
,y)
z
U(w,z)
P
4.3.2 DETERMINISTIC TABLES
In many applications of probabilistic databases, several of the tables are deterministic. Typically, one
starts with a standard database and extends it with new probabilistic tables that contain the uncertain
data. Queries are therefore expressed over a mixed schema where some tables are probabilistic and
others are deterministic. So far, we have assumed that
all
tables occurring in a query are probabilistic.
Deterministic tables could be treated simply as probabilistic tables where all tuples have probability
1
.
0, but this approach is too pessimistic: by recognizing that some tables are deterministic, we
can often transform a hard query into a tractable query. For example, consider the Boolean query
Q
=
R(x),S(x,y),T(y)
: if all three tables are tuple-independent tables, then this query is exactly
H
0
, which we have shown in
Subsection 4.1.3
to be intractable. However, assume now that
T
is
a deterministic table. Then the query can be computed by the following safe plan:
i
∅
i
x
(R(x)
(S(x, y)
y
T(y))
. The last join is between a probabilistic table and a deterministic table. Thus, the
query becomes tractable, if
T
is known to be deterministic.
Consider a probabilistic database, where each relation name is labeled as being either de-
terministic or probabilistic. To take advantage of the deterministic tables, we make the following
change to the independent-project rule: we require the variable
x
to occur in all key positions of
all
probabilistic tables
only. For example, in
R(x),S(x,y),T(y)
,if
T
is deterministic, then we call
x
a root variable since it occurs in all probabilistic tables (and it is also a separator variable since the
query is without self-joins). The modified rule is still sound. It is not known whether the rules
R
6
are
complete over probabilistic databases with mixed schemas, except for conjunctive queries without
self-joins: here is known that the rules are complete [
Dalvi and Suciu
,
2007c
].