Database Reference
In-Depth Information
5. INTENSIONAL QUERY EVALUATION
in polynomial time. Thus, in the rest of this chapter, we will assume that we may use
V ()
instead
Va r ()
, in order to compute it efficiently.
Proposition 5.3
is somewhat surprising in the sense that it states that
pairwise
syntactic in-
dependence implies independence. This shows that syntactic independence is a stronger notion
than independence because, in general, pairwise independent events are
not
necessarily syntactically
independent. For example, consider:
1
=
X
2
=
Y
3
=
(
¬
X
∧¬
Y)
∨
(X
∧
Y)
Assuming
P(X)
=
P(Y)
=
P(Z)
=
1
/
2, we have
P(
1
)
=
P(
2
)
=
P(
3
)
=
1
/
2. Also,
1
∧
2
=
1
∧
3
=
2
∧
3
=
XY
, hence their probabilities are 1
/
4. Therefore, the three events
1
,
2
,
3
are pairwise independent,
P(
i
∧
j
)
=
P(
i
)
·
P(
j
)
, for
i
=
j
. On the other
hand,
P(
1
∧
2
∧
3
)
=
P(XY)
=
1
/
4
=
P(
1
)
·
P(
2
)
·
P(
3
)
=
1
/
8, meaning that the
three events are not independent.
It is interesting to note that a converse to
Proposition 5.3
also holds:
P(
2
),
for any choice of the probabilities P(X
=
a)of the atomic events, then
1
,
2
are syntactically independent.
If
1
,
2
are independent probabilistic events, i.e., P(
1
∧
2
)
=
P(
1
)
·
Proposition 5.4
Proof.
For each random variable
X
i
, with domain
Dom
X
i
={
a
0
,a
1
,...,a
m
i
}
, we introduce
m
i
real variables
x
i,
1
,x
i,
2
,...,x
i,m
i
representing the probabilities of the atomic events
x
i,
1
=
P(X
=
a
1
),...,x
i,m
i
=
P(X
=
a
m
i
)
: we do not need a variable for
P(X
=
a
0
)
since this value is uniquely
defined as 1
x
i,m
i
. We claim that the probability
P ()
of any propositional
formula
is a multi-variate polynomial in the variables
x
i,j
with the following properties: (a) it is
a multi-linear, i.e., the degree of each variable
x
i,j
is a most 1, and (b) if two variables
x
i
1
,j
1
and
x
i
2
,j
2
occur in the same monomial, then
i
1
=
i
2
. Indeed, the claim follows from the fact that for
each assignment
θ
, the probability
P(θ)
given by
Eq. (2.2)
obviously satisfies these two properties,
and the fact
P ()
is a sum of such polynomials (
Eq. (2.3)
).
Using this claim, we prove the proposition. Let
be any propositional formula and
θ
an
assignment to some of its variables, i.e.,
θ(X
i
)
=
a
j
, for some subset of variables
X
i
. Then, the
polynomial
P(
[
θ
]
)
is obtained from
P()
by performing the following substitution on the variables
x
i,j
for which
θ(X
i
)
is defined,
θ(X
i
)
−
x
i,
1
−
x
i,
2
−
...
−
=
a
j
:if
j>
0 then substitute
x
i,j
=
1 and set all other
x
i,k
=
0, and if
j
=
0 then set substitute all variables
x
i,
1
=
x
i,
2
=
...
=
0.
Suppose now
Eq. (5.1)
holds as an identity. Assume that
X
i
∈
Va r (
1
)
∩
Va r (
2
)
: we will
derive a contradiction. Since
X
i
∈
Va r (
1
)
, there exists some substitution
θ
of all variables other than
X
i
, and two values
a
j
1
; that means that one
of the two expressions is
true
and the other
false
, implying
P(
1
[
θ,X
i
→
a
j
1
]
)
=
P(
1
[
θ,X
i
→
a
j
2
]
)
(one is 0 the other is 1). The polynomial
P(
1
[
θ
]
)
can only depend on the variables
x
i,
1
,x
i,
2
,...
, since all variables other than
X
i
are substituted with constants. Since it cannot be
and
a
j
2
such that
1
[
θ,X
i
→
a
j
1
] =
1
[
θ,X
i
→
a
j
2
]