Database Reference
In-Depth Information
4. EXTENSIONAL QUERY EVALUATION
These too are independent because the two atoms R(x,a,b,x) and R(y,y,z,z,) do not unify. More
generally, pairwise syntactic independence implies independence as probabilistic events:
Proposition 4.2 Let Q 1 ,Q 2 ,...,Q k be queries that are pairwise syntactically independent. Then
Q 1 ,...,Q k are independent probabilistic events.
The converse does not hold in general: two queries may be independent probabilistic
events, yet two of their atoms unify; this is because of the absorption law. An example is given
by Q 1 =∃ x. y. z. u.R(x,y,z,z,u) R(x,x,x,y,y) and Q 2 = R(a, a, b, b, c) . If the lineage
of Q 1 contains the Boolean variable X R(a,a,b,b,c) , then the lineage must contain the conjunct
X R(a,a,b,b,c) X R(a,a,a,a,a) . This means that the database contains the tuple R(a, a, a, a, a) , which
implies that the lineage also contains the conjunct X R(a,a,a,a,a) X R(a,a,a,a,a) = X R(a,a,a,a,a) .By
the absorption law X R(a,a,b,b,c) X R(a,a,a,a,a) X R(a,a,a,a,a) = X R(a,a,a,a,a) , which means that
X R(a,a,b,b,c)
Va r ( Q 1 ) ; therefore, Q 1 and Q 2 are independent, although two of their atoms unify.
It is possible to decide whether two queries Q 1 and Q 2 are independent probabilistic events
(and this problem is 2 -complete in the size of the queries [ Miklau and Suciu , 2004 ]). However,
we will not need that test in order to compute query probability, but we will rely only on the syntactic
independence: checking if Q 1 and Q 2 are syntactically independent can be done in polynomial time
in the size of the queries.
4.1.2 SIX SIMPLE RULES FOR P (Q)
We now describe an approach for computing P (Q) on a tuple-independent probabilistic database
D , by applying six simple computation rules. We will illustrate several examples in Subsection 4.1.4 .
4.1.2.1 Rule 1: Independent Join
Suppose a relational query can be written as Q 1 Q 2 where Q 1 ,Q 2 are two syntactically indepen-
dent queries. Then:
Independent-join
P(Q 1 Q 2 ) = P(Q 1 ) · P(Q 1 )
(4.1)
4.1.2.2 Rule 2: Independent Project
Consider an atom L in the query Q , and let x be a variable. Denote Pos(L, x) ⊆[
arity(L) ]
the set
of positions where x occurs in L ; this set may be empty.
Definition 4.3 Let Q be a relational query of the form Q =∃ x.Q .
￿ The variable x is called a root variable if every atom L in Q contains the variable x ; in other
words, Pos(L, x) =∅
.
￿ The variable x is called a separator variable if for any two atoms L 1 ,L 2 in Q that unify, x
occurs in both atoms on a common position; in other words, Pos(L 1 ,x)
Pos(L 2 ,x) =∅
.
Search WWH ::




Custom Search