Database Reference
In-Depth Information
5. INTENSIONAL QUERY EVALUATION
Since computing P () is #P-hard, in general, several approximation techniques have been
developed in the literature. We review in Section 5.3 two approximation techniques, which have been
used in probabilistic databases. The first approximation technique keeps the same flexible, rule-based
evaluation algorithm, but it stops it at any time and returns a lower and upper bound for the true
probability. The second approximation technique is a well known Monte Carlo simulation algorithm
due to Karp and Luby [ 1983 ] and Karp et al. [ 1989 ], which is an FPTRAS (Fully Polynomial-Time
Randomized Approximation Scheme). This algorithm has the advantage that it runs in polynomial
time in the desired precision; the disadvantage is that it must always pay a relatively high price, even
if is rather simple, and that it requires the input formula to be in disjunctive normal form.
Finally, in Section 5.4 , we discuss the connection between a query's expression and the
tractability of its compilation into various compilation targets. More precisely: given a query Q ,
can one compile the lineage Q into an efficient circuit in a given target? For two targets (read-once
formulas and OBDD), this problem can be completely decided by examining the expression Q .For
the other two targets (FBDD and d-DNNF), we give only partial answers.
5.1
PROBABILITY COMPUTATION USING RULES
Let X be a set of independent random variables, where each variable X takes values from a finite
domain Dom X . The probability of the atomic event X
=
a is given by a number P(X
=
a)
∈[
0 , 1
]
,
such that a Dom X P(X = a) =
1.If is a propositional formula over atomic events of the form
X
=
a , then we denote by P () the probability that is true. More precisely, writing ω()
={
θ
|
, the probability is given by P () = θ ω() P(θ) , see Eq. (2.3) .
A special role in our discussion will be played by DNF formulas. A conjunction of atomic
events (X 1 = a 1 ) ∧···∧ (X n = a n ) is called a conjunct , and a DNF formula is a disjunction of
conjuncts. A formula is consistent ,or satisfiable if there exists at least one assignment θ of the
random variables that makes the formula true,
[ θ ]=
true
}
true . DNF formulas are easy to test for
consistency: a single conjunct is consistent iff it does not contain two atomic formulas X = a and
X
[
θ
]=
b , and a DNF formula is consistent if at least one of its conjuncts is consistent.
For example, X = a Y
=
b where a
=
= b Z = c is consistent, while X = a Y
= b X = c is not.
5.1.1 FIVE SIMPLE RULES FOR P ()
We give here five simple rules for computing P () . These rules can be used to compute the proba-
bility of any propositional formula . In some cases, they can be quite effective, but, in general, they
lead to an exponential time algorithm.
5.1.1.1 Rules 1 and 2: Independent AND and OR
Consider the following example: = XY XZ . Re-write as X (Y Z) and obtain P () =
P(X) · P(Y Z) because the two formulas X and Y Z are independent events. Furthermore,
Search WWH ::




Custom Search