Database Reference
In-Depth Information
of a probability
p
if
(
1
ε
, then any value in
[
U
−
ε, L
+
ε
] is an absolute
ε
-approximation of
P ()
. In case
(
1
−
ε)
·
U
≤
(
1
+
ε)
·
L
, then
any value in [
(
1
−
ε)
·
p
≤ˆ
p
≤
(
1
+
ε)
·
p
. In case
U
−
ε
≤
L
+
L
] is a relative
ε
-approximation of
P ()
.
This algorithm can be further improved by employing a different strategy on computing
bounds for the subformulas
i
obtained by any of the five decomposition rules. The idea is as
follows. Given a formula
, we would like to efficiently derive two formulas
L
and
U
such
that the satisfying assignments of
L
are also satisfying assignments of
and that the satisfying
assignments of
are also satisfying assignments of
U
; that is,
ω(
L
)
⊆
ω()
⊆
ω(
U
)
. We call
L
and
U
a lower bound and respectively an upper bound of
. These model-based bounds imply
probability bounds; that is, if
ω(
L
)
−
ε)
·
U, (
1
+
ε)
·
P(
U
)
.Wecan
make effective use of these bounds if
P(
L
)
and
P(
U
)
can be computed efficiently, such as when
L
and
U
are read-once formulas.
⊆
ω()
⊆
ω(
U
)
, then
P(
L
)
≤
P ()
≤
X
2
Y
2
. This is not read-once. Examples
of read-once lower and upper bound formulas would be
L
=
X
1
(Y
1
∨
Y
2
)
and, respectively,
U
Consider the formula
=
X
1
Y
1
∨
X
1
Y
2
∨
Example 5.9
=
X
1
∨
X
2
Y
2
. Note that these bounds are also optimal in the following sense: there are no read-once
formulas
L
and
U
that are lower and upper bounds of
, respectively, such that
ω(
L
)
⊂
ω(
L
)
and
ω(
U
)
⊂
ω(
U
)
.
5.3.2 MONTE CARLO APPROXIMATION
Karp et al.
[
1989
] gave a fully polynomial-time randomized approximation scheme (FPTRAS) for
model counting of DNF formulas based on Monte Carlo simulation. We refer the reader to [
Vazirani
,
2001
] for an in-depth treatment of this approach, we only sketch here how this approach can be
modified to compute the probability of a DNF over independent discrete random variables.
The restriction to DNF formulas is important for the Karp-Luby approximation algorithm,
so in this section, we will make the following two restrictions: the input database
D
is a tuple-
independent database or a BID database, and the query
Q
is a UCQ: then, one can compute a DNF
formula for the lineage
Q
in polynomial time in the size of the database.
We start by describing the naïve Monte Carlo algorithm for a propositional formula
over
Boolean variables. (Here we do not need to restrict
to be in DNF.). Recall that
represents the
space of all 2
n
possible valuations of the
n
Boolean variables.
Definition 5.10 Naïve Estimator
The Naïve Estimator for the probability of propositional for-
mula
over independent Boolean random variables is:
1. Choose a valuation
θ
∈
, with probability
P(θ)
. This means the following: every variable
X
occurring in
is set to true with probability
P(X)
.
2. If
[
θ
]
is true, then return
Z
=
1; otherwise, return
Z
=
0.