Intensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

and for that it needs at most w 1 ·

w 2 gates; a leaf node returnsa1ifboth OBDDs returned 1 (for

1 ∧ 2 ) or if at least one of the two OBDDs returned 1 (for 1 ∨ 2 ).

5.2.4 READ-ONCE FORMULAS

We consider read-once formulas as a simple case of a circuit, where all non-leaf gates are either

independent-AND , independent-OR ,or NOT gates, and all variables are on the leaf nodes. Read-

once are different from the other compilation targets in that some formulas are not read-once, but

every formula can be compiled into any of the other three targets.

5.3

APPROXIMATING P ()

The rules in Subsection 5.1.1 are good heuristics for computing P () , but we should expect

them to run in exponential time on some formulas because computing P () is provably hard

( Theorem 3.1 ). In probabilistic databases, however, we often do not need to compute P () exactly.

In many applications we need probabilities only to rank the answers to a query, and approximate

values of the probabilities are often sufficient for ranking (as we discuss in Section 6.1 ). Even when

probability values need to be returned to the user, it is often desirable to improve performance by

sacrificing precision.

We next present two approximation algorithms, a deterministic algorithm that works on any

propositional formula but cannot guarantee polynomial running time as a function of the desired

precision and a randomized algorithm that works on DNF formulas and can guarantee polynomial

time in the desired precision. Both these algorithms can becomes quite effective in probabilistic

databases when combined with the top-k evaluation technique discussed in Section 6.1 .

5.3.1 A DETERMINISTIC APPROXIMATION ALGORITHM

Olteanu et al. [ 2010 ] describe an approach by which instead of computing the exact probability

p = P () of a formula , one returns an interval

[ L, U ]

such that L ≤ p ≤ U . We start with the

following two simple approximations rules:

max (P ( 1 ), P ( 2 )) ≤ P( 1 ∨ 2 ) ≤

min (P ( 1 ) + P( 2 ), 1 )

(5.6)

max ( 0 ,P( 1 ) + P( 2 ) − 1 ) ≤ P( 1 ∧ 2 ) ≤ min (P ( 1 ), P ( 2 ))

(5.7)

By applying the rules repeatedly, together with P( ¬ ) = 1 − , we can obtain a lower and

upper bound for every formula , P ()

can be computed

in linear time in the size of . In the special case of a DNF formula , we only need Eq. (5.6) for the

approximation: once we reach conjuncts, we can compute their probabilities exactly, by multiplying

the probabilities of their atomic events.

We will modify Algorithm 2 such that it can stop at any time and return an approximate

value of P () ; in other words, we modify it to trade off performance for accuracy. We start by

∈[

L, U

]

. Importantly, the bounds

[

L, U

]

Probabilistic Databases

Search WWH ::

Custom Search

Home