Database Reference
In-Depth Information
4.1
QUERY EVALUATION USING RULES
Throughout this section, we assume that the input database D is tuple-independent. Therefore, each
tuple t is associated with a Boolean variable X t , and we know the probability of this Boolean variable,
which we often denote P(t) = P(X t ) . The problem addressed in this section is the following: given
a Boolean query Q , compute P (Q) on some tuple-independent probabilistic database D . Recall that
Q , or, briefly, Q , represents the lineage of Q on D , and that P (Q) = P( Q ) .
4.1.1 QUERY INDEPENDENCE
Consider two Boolean queries Q 1 ,Q 2 . For a fixed probabilistic database D , they define two prob-
abilistic events. When are these two events independent? This question is of importance because
if Q 1 ,Q 2 are independent, then we can compute easily P(Q 1 Q 2 ) as P(Q 1 ) · P(Q 2 ) . In this
section, we give a simple syntactic condition on Q 1 ,Q 2 that is sufficient for independence.
For a propositional formula , let Va r () denote the set of Boolean variables X that depends
on. A formal definition of Va r () will be given in Chapter 5 ; for now, it suffices to assume that
is given as an expression, and Va r () represents all Boolean variables in that expression. Clearly, if
Va r ( Q 1 )
, then Q 1 and Q 2 are independent.
Consider a relational atom L occurring in a RC query as defined by Eq. (2.1) . The atom is
of the form L
Va r ( Q 2 )
=∅
R(v 1 ,...,v k ) where R is a relational symbol, and v 1 ,...,v k are variables and/or
constants. Denote
=
x the set of variables that occur in this atom. An image of the atom L is a ground
tuple of the form L [ a/x ]
obtained by substituting all variables
x in the atom with some constants
a .
Any Boolean variable X t
Va r ( Q ) corresponds to a tuple t that is the image of some atom in Q .
Definition 4.1 Two relational atoms L 1 and L 2 are said to be unifiable ,or to unify , if they have a
common image. In other words, there exist substitutions under which the two atoms become the
same tuple: L 1
x 2 are the variables in L 2 .
Two queries Q 1 ,Q 2 are called syntactically independent if no two atoms from Q 1 and Q 2 unify.
a 1 /
x 1 ]=
¯
L 2
a 2 /
x 2 ]
¯
, where
x 1 are the variables in L 1 and
¯
¯
If two atoms unify, then they must use the same relation symbol R . For example, the two R -
atoms in the queries R(x,a),S(x,b) and R(b,y),S(c,y) unify because both have R(b, a) as image,
but the atom R(x, a) does not unify with S(c,y) because they have different relational symbols.
On the other hand, the two R -atoms of R(x,a),S(x,b) and R(x,b),S(x,d) do not unify because
they do not have a common image (assume a and b are two distinct constants).
If two queries Q 1 ,Q 2 are syntactically independent, then they are independent probabilistic
events because, in that case, Va r ( Q 1 )
y.R(x),S(x,y)
and Q 2 =∃ z.T (z) , then these two queries are independent probabilistic events, and therefore the
probability of the query Q = R(x),S(x,y),T(z) = ( x. y.R(x),S(x,y)) ( z.T (z)) is P (Q) =
P(Q 1 ) · P(Q 2 ) . Another example is given by Q 1 =∃ x.R(x,a,b,x) and Q 2 =∃ y. z.R(y, y, z, z) .
Va r ( Q 2 )
=∅
. For example, if Q 1 =∃
x.
Search WWH ::




Custom Search