Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

where X 1 and X 1 are two clones of X 1 . In other words, the reason why the plan on the left is unsafe is

because it fails to recognize that two occurrences of X 1 are the same variable, but instead treats them

as independent variables, with the same probability as X 1 . This is captured by d , where we have

replaced the two occurrences of X 1 with two distinct variables X 1 and X 1 , with the same probability

p 1 as X 1 . The following proposition shows that P ()

P( d ) :

≤

Proposition 4.31 Consider two propositional formulas:

k X k

= 0 ∨ 1 X ∨ ... ∨ k X

0 ∨

1 X 1 ∨

...

∨

such that P(X) = P(X 1 ) = ... = P(X k ) and none of the formulas i , i = 0 ,k contains any of the

variables X, X 1 ,...,X k . Then P () ≤ P( d ).

Intuitively, the proposition says this. We have a formula that has multiple occurrences of

some variable X : all these occurrences are positive (i.e., non-negated), and separated by

.Ifwe

“clone” each occurrence into a new variable X i , such that all the clones are independent and have

the same probability as X , then the probability can only increase. This cloning is called dissocia-

tion [ Gatterbauer et al. , 2010 , Gatterbauer and Suciu , 2011 ] and is related to relaxation [ Darwiche ,

2010 ] but differs from the latter in that it gives an upper bound guarantee.

Proof. We claim that P() ≤ P( d ) , where

∨

2 X 2

= 1 X ∨ 2 X

1 X 1 ∨

and x

P(X 2 ) . The claim follows from the inclusion-exclusion formula:

P() = xP( 1 ) + xP( 2 ) − xP( 1 2 ) ≤ P( d ) = xP( 1 ) + xP( 2 ) − x 2 P( 1 2 )

thus P() ≤ P( d ) because x 2

P(X)

P(X 1 )

≤ x .

We prove now the proposition by induction on k . For the base case, k

2, denote i =

( ¬ 0 ) ∧ i , for i = 1 , 2, then we have:

0 ∨

(

0 ) 1 X 1 ∨

(

0 ) 2 X 2 =

0 ∨

= 0 ∨ ( ¬ 0 ) 1 X ∨ ( ¬ 0 ) 2 X = 0 ∨

The base case follows from P( 0 ∨ ) = P( 0 ) + P() ≤ P( 0 ) + P( d ) = P( 0 ∨ d ) .We

prove now the inductive step. Consider:

= 0 ∨ 1 X 1 ∨ ... ∨ k − 2 X k − 2 ∨ k − 1 X k − 1 ∨ k X k

k )Y

= 0 ∨ 1 X ∨ ... ∨ k − 2 X ∨ ( k − 1 ∨ k )X

Then P () ≤ P( ) follows by induction (for k − 1), and P( ) ≤ P( d ) follows from the base

case ( k =

0 ∨

1 X 1 ∨

...

∨

k − 2 X k − 2 ∨

( k − 1 ∨

2).

Probabilistic Databases

Search WWH ::

Custom Search

Home