Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

4.1

QUERY EVALUATION USING RULES

Throughout this section, we assume that the input database D is tuple-independent. Therefore, each

tuple t is associated with a Boolean variable X t , and we know the probability of this Boolean variable,

which we often denote P(t) = P(X t ) . The problem addressed in this section is the following: given

a Boolean query Q , compute P (Q) on some tuple-independent probabilistic database D . Recall that

Q , or, briefly, Q , represents the lineage of Q on D , and that P (Q) = P( Q ) .

4.1.1 QUERY INDEPENDENCE

Consider two Boolean queries Q 1 ,Q 2 . For a fixed probabilistic database D , they define two prob-

abilistic events. When are these two events independent? This question is of importance because

if Q 1 ,Q 2 are independent, then we can compute easily P(Q 1 ∧ Q 2 ) as P(Q 1 ) · P(Q 2 ) . In this

section, we give a simple syntactic condition on Q 1 ,Q 2 that is sufficient for independence.

For a propositional formula , let Va r () denote the set of Boolean variables X that depends

on. A formal definition of Va r () will be given in Chapter 5 ; for now, it suffices to assume that

is given as an expression, and Va r () represents all Boolean variables in that expression. Clearly, if

Va r ( Q 1 )

, then Q 1 and Q 2 are independent.

Consider a relational atom L occurring in a RC query as defined by Eq. (2.1) . The atom is

of the form L

∩

Va r ( Q 2 )

=∅

R(v 1 ,...,v k ) where R is a relational symbol, and v 1 ,...,v k are variables and/or

constants. Denote

x the set of variables that occur in this atom. An image of the atom L is a ground

tuple of the form L [ a/x ]

obtained by substituting all variables

x in the atom with some constants

a .

Any Boolean variable X t ∈

Va r ( Q ) corresponds to a tuple t that is the image of some atom in Q .

Definition 4.1 Two relational atoms L 1 and L 2 are said to be unifiable ,or to unify , if they have a

common image. In other words, there exist substitutions under which the two atoms become the

same tuple: L 1 [¯

x 2 are the variables in L 2 .

Two queries Q 1 ,Q 2 are called syntactically independent if no two atoms from Q 1 and Q 2 unify.

a 1 /

x 1 ]=

L 2 [¯

a 2 /

x 2 ]

, where

x 1 are the variables in L 1 and

If two atoms unify, then they must use the same relation symbol R . For example, the two R -

atoms in the queries R(x,a),S(x,b) and R(b,y),S(c,y) unify because both have R(b, a) as image,

but the atom R(x, a) does not unify with S(c,y) because they have different relational symbols.

On the other hand, the two R -atoms of R(x,a),S(x,b) and R(x,b),S(x,d) do not unify because

they do not have a common image (assume a and b are two distinct constants).

If two queries Q 1 ,Q 2 are syntactically independent, then they are independent probabilistic

events because, in that case, Va r ( Q 1 )

y.R(x),S(x,y)

and Q 2 =∃ z.T (z) , then these two queries are independent probabilistic events, and therefore the

probability of the query Q = R(x),S(x,y),T(z) = ( ∃ x. ∃ y.R(x),S(x,y)) ∧ ( ∃ z.T (z)) is P (Q) =

P(Q 1 ) · P(Q 2 ) . Another example is given by Q 1 =∃ x.R(x,a,b,x) and Q 2 =∃ y. ∃ z.R(y, y, z, z) .

∩

Va r ( Q 2 )

=∅

. For example, if Q 1 =∃

∃

Probabilistic Databases

Search WWH ::

Custom Search

Home