Intensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

of a probability p if ( 1

ε , then any value in

[ U − ε, L + ε ] is an absolute ε -approximation of P () . In case ( 1 − ε) · U ≤ ( 1 + ε) · L , then

any value in [ ( 1

−

ε)

≤ˆ

≤

( 1

ε)

p . In case U

−

≤

L ] is a relative ε -approximation of P () .

This algorithm can be further improved by employing a different strategy on computing

bounds for the subformulas i obtained by any of the five decomposition rules. The idea is as

follows. Given a formula , we would like to efficiently derive two formulas L and U such

that the satisfying assignments of L are also satisfying assignments of and that the satisfying

assignments of are also satisfying assignments of U ; that is, ω( L ) ⊆ ω() ⊆ ω( U ) . We call

L and U a lower bound and respectively an upper bound of . These model-based bounds imply

probability bounds; that is, if ω( L )

−

ε)

U, ( 1

ε)

P( U ) .Wecan

make effective use of these bounds if P( L ) and P( U ) can be computed efficiently, such as when

L and U are read-once formulas.

⊆

ω()

⊆

ω( U ) , then P( L )

≤

P ()

≤

X 2 Y 2 . This is not read-once. Examples

of read-once lower and upper bound formulas would be L = X 1 (Y 1 ∨ Y 2 ) and, respectively, U

Consider the formula

X 1 Y 1 ∨

X 1 Y 2 ∨

Example 5.9

X 1 ∨

X 2 Y 2 . Note that these bounds are also optimal in the following sense: there are no read-once

formulas L and U that are lower and upper bounds of , respectively, such that ω( L ) ⊂ ω( L )

and ω( U ) ⊂ ω( U ) .

5.3.2 MONTE CARLO APPROXIMATION

Karp et al. [ 1989 ] gave a fully polynomial-time randomized approximation scheme (FPTRAS) for

model counting of DNF formulas based on Monte Carlo simulation. We refer the reader to [ Vazirani ,

2001 ] for an in-depth treatment of this approach, we only sketch here how this approach can be

modified to compute the probability of a DNF over independent discrete random variables.

The restriction to DNF formulas is important for the Karp-Luby approximation algorithm,

so in this section, we will make the following two restrictions: the input database D is a tuple-

independent database or a BID database, and the query Q is a UCQ: then, one can compute a DNF

formula for the lineage Q in polynomial time in the size of the database.

We start by describing the naïve Monte Carlo algorithm for a propositional formula over

Boolean variables. (Here we do not need to restrict to be in DNF.). Recall that represents the

space of all 2 n possible valuations of the n Boolean variables.

Definition 5.10 Naïve Estimator The Naïve Estimator for the probability of propositional for-

mula over independent Boolean random variables is:

1. Choose a valuation θ ∈ , with probability P(θ) . This means the following: every variable X

occurring in is set to true with probability P(X) .

2. If [ θ ]

is true, then return Z =

1; otherwise, return Z =

Probabilistic Databases

Search WWH ::

Custom Search

Home