Database Reference
In-Depth Information
of a probability p if ( 1
ε , then any value in
[ U ε, L + ε ] is an absolute ε -approximation of P () . In case ( 1 ε) · U ( 1 + ε) · L , then
any value in [ ( 1
ε)
·
p
≤ˆ
p
( 1
+
ε)
·
p . In case U
ε
L
+
L ] is a relative ε -approximation of P () .
This algorithm can be further improved by employing a different strategy on computing
bounds for the subformulas i obtained by any of the five decomposition rules. The idea is as
follows. Given a formula , we would like to efficiently derive two formulas L and U such
that the satisfying assignments of L are also satisfying assignments of and that the satisfying
assignments of are also satisfying assignments of U ; that is, ω( L ) ω() ω( U ) . We call
L and U a lower bound and respectively an upper bound of . These model-based bounds imply
probability bounds; that is, if ω( L )
ε)
·
U, ( 1
+
ε)
·
P( U ) .Wecan
make effective use of these bounds if P( L ) and P( U ) can be computed efficiently, such as when
L and U are read-once formulas.
ω()
ω( U ) , then P( L )
P ()
X 2 Y 2 . This is not read-once. Examples
of read-once lower and upper bound formulas would be L = X 1 (Y 1 Y 2 ) and, respectively, U
Consider the formula
=
X 1 Y 1
X 1 Y 2
Example 5.9
=
X 1
X 2 Y 2 . Note that these bounds are also optimal in the following sense: there are no read-once
formulas L and U that are lower and upper bounds of , respectively, such that ω( L ) ω( L )
and ω( U ) ω( U ) .
5.3.2 MONTE CARLO APPROXIMATION
Karp et al. [ 1989 ] gave a fully polynomial-time randomized approximation scheme (FPTRAS) for
model counting of DNF formulas based on Monte Carlo simulation. We refer the reader to [ Vazirani ,
2001 ] for an in-depth treatment of this approach, we only sketch here how this approach can be
modified to compute the probability of a DNF over independent discrete random variables.
The restriction to DNF formulas is important for the Karp-Luby approximation algorithm,
so in this section, we will make the following two restrictions: the input database D is a tuple-
independent database or a BID database, and the query Q is a UCQ: then, one can compute a DNF
formula for the lineage Q in polynomial time in the size of the database.
We start by describing the naïve Monte Carlo algorithm for a propositional formula over
Boolean variables. (Here we do not need to restrict to be in DNF.). Recall that represents the
space of all 2 n possible valuations of the n Boolean variables.
Definition 5.10 Naïve Estimator The Naïve Estimator for the probability of propositional for-
mula over independent Boolean random variables is:
1. Choose a valuation θ , with probability P(θ) . This means the following: every variable X
occurring in is set to true with probability P(X) .
2. If [ θ ]
is true, then return Z =
1; otherwise, return Z =
0.
Search WWH ::




Custom Search