Database Reference
In-Depth Information
representation of the BID table has schema
T(z,y,P)
, and there may be several tuples with
z
=
b
,
and the sum of their probabilities is
≤
1. A tuple-independent table is a special case of a BID table,
where all attributes are key attributes: thus,
S(
x,y
)
is a tuple-independent table, and in that case,
we simply write it as
S(x,y)
when no confusion arises.
The six query evaluation rules discussed in
Section 4.1
extend to BID tables, with the following
changes.
A
root variable
is a a variables
x
occurs in every atom
in a key position
.A
separator variable
is a
root variable such that for any two unifiable atoms it occurs on a common key position. Thus,
non-key positions do not matter when determining whether a variable is a root or a separator
variable.
All six rules in
Section 4.1
remain unchanged. The
independent-project
applies to any separator
variable, with the adjustment mentioned here.
There is one new rule, called
disjoint project
. It comes in two variants. First, suppose
Q
is a
Boolean conjunctive query that has an atom
R(...)
where (a) all attributes in key positions
contain constants (i.e., there are no variables in key positions), and (b)
x
is a variable occurring
on a non-key attribute of
R
. Then:
P (Q)
=
P(Q
[
a/x
]
)
a
∈
ADom
Second, suppose
Q
is a UCQ query of the form
Q
=
Q
1
∨
Q
where
Q
1
is a conjunctive
query having an atom where all key attribute are constants, and
Q
is any other UCQ query.
Then we write
P (Q)
=
P(Q
1
)
+
P(Q
)
−
P(Q
1
∧
Q
)
, then apply the disjoint-project rule
both to
Q
1
and to
Q
1
∧
Q
.
It is known that for the class of conjunctive queries without self-joins,
CQ
NR
, the rules extended
with disjoint project are sufficient to compute all polynomial-time queries. Thus, these queries
admit a dichotomy over BID tables. The hard queries in
CQ
NR
over BID tables can be
described by three types of queries:
H
0
=
R(x),S(
x,y
), T (
y
)
(here all three tables are tuple-
independent), and two more queries,
R(
x
, y), S(
y
)
and
R(
x
,y),S(x,
y
)
[
Dalvi and Suciu
,
2007c
]. It is not known whether a similar dichotomy holds for UCQ.
Finally, the disjoint-project rule above corresponds to a simple relational operator, called
disjoint
project
,
d
A
: during duplicate elimination, it computes the output probability as
p
1
+
p
2
+
...
Example 4.33
Let
R(x),S(x,y)
be tuple-independent tables, and
T(
z
,y)
,
U(
w
,z)
be BID-tables.
Consider the Boolean query
Q
=
R(x), S(x, y), T (
z
, y), U(a,z)
, where
a
is a constant. Then we