Database Reference
In-Depth Information
3. THE QUERY EVALUATION PROBLEM
whose running time is exponential in the number of Boolean variables, because one would have to
iterate over all 2 n assignments θ , check if they satisfy , then add the probabilities of those that do.
Typically, n is the number of records in the database, hence this approach is prohibitive. It turns out
that there is, essentially, no better way of computing P () in general: this problem is provably hard.
In order to state this formally, we introduce two problems.
Model Counting Problem Given a propositional formula , count the number of satisfying as-
signments # , i.e., compute # =| ω() |
.
Probability Computation Problem Given a propositional formula and a probability P(X)
[ 0 , 1 ]
for each Boolean variable X , compute the probability P () = θ ω() P(θ) .
Model counting is a special case of probability computation, because any algorithm for com-
puting P () can be used to compute # . Define P(X) = 1 / 2 for every variable X : then P(θ) =
1 / 2 n
2 n .
A classical result, which we review here, is that the model counting problem is hard, even if is
restricted to a simple class of propositional formulas, as discussed next: this implies immediately that
the probability computation problem is also hard.
Recall that SAT is NP-complete, where SAT is the satisfiability problem: “given , check
if is satisfiable”. The decision problem 3SAT, where is restricted to a 3CNF formula, is also
NP-complete, but the problem 2SAT is in polynomial time.
The complexity class #P was introduced by Valiant [ 1979 ] and consists of all function problems
of the following type: given a polynomial-time, non-deterministic Turing machine, compute the
number of accepting computations. The model counting problem, “given , compute # ”, is also
denoted #SAT, and is obviously in #P. Note that any algorithm solving #SAT can be used to solve
SAT, by simply using the former to obtain # and then testing whether # > 0.
Valiant proved that #SAT is hard for #P. He also showed that computing # remains hard for
#P even if is restricted to be in P2CNF, the class of positive 2CNF formulas, i.e., formulas where
each clause consists of two positive literals, X i X j . It follows immediately that #SAT remains
hard for #P even for P2DNF formulas, i.e., formulas that are disjunctions of conjuncts of the form
X i X j . This has been further strengthened by Provan and Ball [ 1983 ]; they proved the following
result, which is the most important hardness result used in probabilistic databases:
for every assignment θ , where n is the number of variables, and therefore #
=
P ()
·
Theorem 3.1
Let X 1 ,X 2 ,... and Y 1 ,Y 2 ,... be two disjoint sets of Boolean variables.
￿A Positive, Partitioned 2-DNF propositional formula is a DNF formula of the form:
=
(i,j)
X i Y j
E
The #PP2DNF problem is “given a PP2DNF formula , compute # ”.
Search WWH ::




Custom Search