Database Reference
In-Depth Information
and for that it needs at most w 1 ·
w 2 gates; a leaf node returnsa1ifboth OBDDs returned 1 (for
1 2 ) or if at least one of the two OBDDs returned 1 (for 1 2 ).
5.2.4 READ-ONCE FORMULAS
We consider read-once formulas as a simple case of a circuit, where all non-leaf gates are either
independent-AND , independent-OR ,or NOT gates, and all variables are on the leaf nodes. Read-
once are different from the other compilation targets in that some formulas are not read-once, but
every formula can be compiled into any of the other three targets.
5.3
APPROXIMATING P ()
The rules in Subsection 5.1.1 are good heuristics for computing P () , but we should expect
them to run in exponential time on some formulas because computing P () is provably hard
( Theorem 3.1 ). In probabilistic databases, however, we often do not need to compute P () exactly.
In many applications we need probabilities only to rank the answers to a query, and approximate
values of the probabilities are often sufficient for ranking (as we discuss in Section 6.1 ). Even when
probability values need to be returned to the user, it is often desirable to improve performance by
sacrificing precision.
We next present two approximation algorithms, a deterministic algorithm that works on any
propositional formula but cannot guarantee polynomial running time as a function of the desired
precision and a randomized algorithm that works on DNF formulas and can guarantee polynomial
time in the desired precision. Both these algorithms can becomes quite effective in probabilistic
databases when combined with the top-k evaluation technique discussed in Section 6.1 .
5.3.1 A DETERMINISTIC APPROXIMATION ALGORITHM
Olteanu et al. [ 2010 ] describe an approach by which instead of computing the exact probability
p = P () of a formula , one returns an interval
[ L, U ]
such that L p U . We start with the
following two simple approximations rules:
max (P ( 1 ), P ( 2 )) P( 1 2 )
min (P ( 1 ) + P( 2 ), 1 )
(5.6)
max ( 0 ,P( 1 ) + P( 2 ) 1 ) P( 1 2 ) min (P ( 1 ), P ( 2 ))
(5.7)
By applying the rules repeatedly, together with P( ¬ ) = 1 , we can obtain a lower and
upper bound for every formula , P ()
can be computed
in linear time in the size of . In the special case of a DNF formula , we only need Eq. (5.6) for the
approximation: once we reach conjuncts, we can compute their probabilities exactly, by multiplying
the probabilities of their atomic events.
We will modify Algorithm 2 such that it can stop at any time and return an approximate
value of P () ; in other words, we modify it to trade off performance for accuracy. We start by
∈[
L, U
]
. Importantly, the bounds
[
L, U
]
Search WWH ::




Custom Search