Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

4. EXTENSIONAL QUERY EVALUATION

These too are independent because the two atoms R(x,a,b,x) and R(y,y,z,z,) do not unify. More

generally, pairwise syntactic independence implies independence as probabilistic events:

Proposition 4.2 Let Q 1 ,Q 2 ,...,Q k be queries that are pairwise syntactically independent. Then

Q 1 ,...,Q k are independent probabilistic events.

The converse does not hold in general: two queries may be independent probabilistic

events, yet two of their atoms unify; this is because of the absorption law. An example is given

by Q 1 =∃ x. ∃ y. ∃ z. ∃ u.R(x,y,z,z,u) ∧ R(x,x,x,y,y) and Q 2 = R(a, a, b, b, c) . If the lineage

of Q 1 contains the Boolean variable X R(a,a,b,b,c) , then the lineage must contain the conjunct

X R(a,a,b,b,c) ∧ X R(a,a,a,a,a) . This means that the database contains the tuple R(a, a, a, a, a) , which

implies that the lineage also contains the conjunct X R(a,a,a,a,a) ∧ X R(a,a,a,a,a) = X R(a,a,a,a,a) .By

the absorption law X R(a,a,b,b,c) ∧ X R(a,a,a,a,a) ∨ X R(a,a,a,a,a) = X R(a,a,a,a,a) , which means that

X R(a,a,b,b,c) ∈

Va r ( Q 1 ) ; therefore, Q 1 and Q 2 are independent, although two of their atoms unify.

It is possible to decide whether two queries Q 1 and Q 2 are independent probabilistic events

(and this problem is 2 -complete in the size of the queries [ Miklau and Suciu , 2004 ]). However,

we will not need that test in order to compute query probability, but we will rely only on the syntactic

independence: checking if Q 1 and Q 2 are syntactically independent can be done in polynomial time

in the size of the queries.

4.1.2 SIX SIMPLE RULES FOR P (Q)

We now describe an approach for computing P (Q) on a tuple-independent probabilistic database

D , by applying six simple computation rules. We will illustrate several examples in Subsection 4.1.4 .

4.1.2.1 Rule 1: Independent Join

Suppose a relational query can be written as Q 1 ∧ Q 2 where Q 1 ,Q 2 are two syntactically indepen-

dent queries. Then:

Independent-join

P(Q 1 ∧ Q 2 ) = P(Q 1 ) · P(Q 1 )

(4.1)

4.1.2.2 Rule 2: Independent Project

Consider an atom L in the query Q , and let x be a variable. Denote Pos(L, x) ⊆[

arity(L) ]

the set

of positions where x occurs in L ; this set may be empty.

Definition 4.3 Let Q be a relational query of the form Q =∃ x.Q .

The variable x is called a root variable if every atom L in Q contains the variable x ; in other

words, Pos(L, x) =∅

The variable x is called a separator variable if for any two atoms L 1 ,L 2 in Q that unify, x

occurs in both atoms on a common position; in other words, Pos(L 1 ,x) ∩

Pos(L 2 ,x) =∅

Search WWH ::

Custom Search

Home