Database Reference
In-Depth Information
where
X
1
and
X
1
are two
clones
of
X
1
. In other words, the reason why the plan on the left is unsafe is
because it fails to recognize that two occurrences of
X
1
are the same variable, but instead treats them
as independent variables, with the same probability as
X
1
. This is captured by
d
, where we have
replaced the two occurrences of
X
1
with two distinct variables
X
1
and
X
1
, with the same probability
p
1
as
X
1
. The following proposition shows that
P ()
P(
d
)
:
≤
Proposition 4.31
Consider two propositional formulas:
d
k
X
k
=
0
∨
1
X
∨
...
∨
k
X
=
0
∨
1
X
1
∨
...
∨
such that P(X)
=
P(X
1
)
=
...
=
P(X
k
) and none of the formulas
i
, i
=
0
,k contains any of the
variables X, X
1
,...,X
k
. Then P ()
≤
P(
d
).
Intuitively, the proposition says this. We have a formula
that has multiple occurrences of
some variable
X
: all these occurrences are positive (i.e., non-negated), and separated by
.Ifwe
“clone” each occurrence into a new variable
X
i
, such that all the clones are independent and have
the same probability as
X
, then the probability can only increase. This cloning is called
dissocia-
tion
[
Gatterbauer et al.
,
2010
,
Gatterbauer and Suciu
,
2011
] and is related to
relaxation
[
Darwiche
,
2010
] but differs from the latter in that it gives an upper bound guarantee.
Proof.
We claim that
P()
≤
P(
d
)
, where
d
∨
2
X
2
=
1
X
∨
2
X
=
1
X
1
∨
and
x
P(X
2
)
. The claim follows from the inclusion-exclusion formula:
P()
=
xP(
1
)
+
xP(
2
)
−
xP(
1
2
)
≤
P(
d
)
=
xP(
1
)
+
xP(
2
)
−
x
2
P(
1
2
)
thus
P()
≤
P(
d
)
because
x
2
=
P(X)
=
P(X
1
)
=
≤
x
.
We prove now the proposition by induction on
k
. For the base case,
k
=
2, denote
i
=
(
¬
0
)
∧
i
, for
i
=
1
,
2, then we have:
d
d
=
0
∨
(
¬
0
)
1
X
1
∨
(
¬
0
)
2
X
2
=
0
∨
=
0
∨
(
¬
0
)
1
X
∨
(
¬
0
)
2
X
=
0
∨
The base case follows from
P(
0
∨
)
=
P(
0
)
+
P()
≤
P(
0
)
+
P(
d
)
=
P(
0
∨
d
)
.We
prove now the inductive step. Consider:
d
=
0
∨
1
X
1
∨
...
∨
k
−
2
X
k
−
2
∨
k
−
1
X
k
−
1
∨
k
X
k
=
k
)Y
=
0
∨
1
X
∨
...
∨
k
−
2
X
∨
(
k
−
1
∨
k
)X
Then
P ()
≤
P(
)
follows by induction (for
k
−
1), and
P(
)
≤
P(
d
)
follows from the base
case (
k
=
0
∨
1
X
1
∨
...
∨
k
−
2
X
k
−
2
∨
(
k
−
1
∨
2).