Database Reference
In-Depth Information
4. EXTENSIONAL QUERY EVALUATION
These too are independent because the two atoms
R(x,a,b,x)
and
R(y,y,z,z,)
do not unify. More
generally, pairwise syntactic independence implies independence as probabilistic events:
Proposition 4.2
Let Q
1
,Q
2
,...,Q
k
be queries that are pairwise syntactically independent. Then
Q
1
,...,Q
k
are independent probabilistic events.
The converse does not hold in general: two queries may be independent probabilistic
events, yet two of their atoms unify; this is because of the absorption law. An example is given
by
Q
1
=∃
x.
∃
y.
∃
z.
∃
u.R(x,y,z,z,u)
∧
R(x,x,x,y,y)
and
Q
2
=
R(a, a, b, b, c)
. If the lineage
of
Q
1
contains the Boolean variable
X
R(a,a,b,b,c)
, then the lineage must contain the conjunct
X
R(a,a,b,b,c)
∧
X
R(a,a,a,a,a)
. This means that the database contains the tuple
R(a, a, a, a, a)
, which
implies that the lineage also contains the conjunct
X
R(a,a,a,a,a)
∧
X
R(a,a,a,a,a)
=
X
R(a,a,a,a,a)
.By
the absorption law
X
R(a,a,b,b,c)
∧
X
R(a,a,a,a,a)
∨
X
R(a,a,a,a,a)
=
X
R(a,a,a,a,a)
, which means that
X
R(a,a,b,b,c)
∈
Va r (
Q
1
)
; therefore,
Q
1
and
Q
2
are independent, although two of their atoms unify.
It is possible to decide whether two queries
Q
1
and
Q
2
are independent probabilistic events
(and this problem is
2
-complete in the size of the queries [
Miklau and Suciu
,
2004
]). However,
we will not need that test in order to compute query probability, but we will rely only on the syntactic
independence: checking if
Q
1
and
Q
2
are syntactically independent can be done in polynomial time
in the size of the queries.
4.1.2 SIX SIMPLE RULES FOR
P (Q)
We now describe an approach for computing
P (Q)
on a tuple-independent probabilistic database
D
, by applying six simple computation rules. We will illustrate several examples in
Subsection 4.1.4
.
4.1.2.1 Rule 1: Independent Join
Suppose a relational query can be written as
Q
1
∧
Q
2
where
Q
1
,Q
2
are two syntactically indepen-
dent queries. Then:
Independent-join
P(Q
1
∧
Q
2
)
=
P(Q
1
)
·
P(Q
1
)
(4.1)
4.1.2.2 Rule 2: Independent Project
Consider an atom
L
in the query
Q
, and let
x
be a variable. Denote
Pos(L, x)
⊆[
arity(L)
]
the set
of positions where
x
occurs in
L
; this set may be empty.
Definition 4.3
Let
Q
be a relational query of the form
Q
=∃
x.Q
.
The variable
x
is called a
root variable
if every atom
L
in
Q
contains the variable
x
; in other
words,
Pos(L, x)
=∅
.
The variable
x
is called a
separator variable
if for any two atoms
L
1
,L
2
in
Q
that unify,
x
occurs in both atoms on a common position; in other words,
Pos(L
1
,x)
∩
Pos(L
2
,x)
=∅
.