Database Reference
In-Depth Information
CHAPTER
3
The Query Evaluation Problem
We now turn to the central problem in probabilistic databases: query evaluation. Given a query
Q and a probabilistic database D , evaluate Q on D . We consider the possible answers semantics ,
Definition 2.5 , under which the answer to a query Q is an ordered set of answer-probability pairs,
{
...
Query evaluation is a major challenge for two reasons. On one hand, the problem is provably
hard: computing the output probabilities is hard for #P (a complexity class that we review in this
chapter). On the other hand, database systems are expected to scale, and we cannot restrict their
functionality based on tractability considerations. Users' experience with common databases is that
all queries scale to large data sets, or parallelize to large number of processors, and the same behavior
is expected from probabilistic databases. We discuss in this topic a number of recent advances to
query evaluation on probabilistic databases that brings us closer towards that goal.
The query evaluation problem is formally defined as follows:
(t 1 ,p 1 ), (t 2 ,p 2 ),...
}
, such that p 1
p 2
Query Evaluation Problem For a fixed query Q : given a probabilistic database D and possible
answer tuple t , compute its marginal probability P(t
Q) .
In this chapter we show that the query evaluation problem is #P-hard, even if the input database
is a tuple-independent database. The restriction on the input is without loss of generality: query
evaluation remains hard on more general inputs, as long as they allow tuple-independent databases
as a special case. This applies to BID databases, U-databases, and ultimately to pc-tables. This
hardness result sets the bar for probabilistic databases quite high. In the following two chapters we
will describe two approaches to query evaluation, extensional evaluation, and intensional evaluation,
which, together, form a powerful set of techniques for coping with the query evaluation challenge.
3.1
THE COMPLEXITY OF P ()
Recall from Section 2.5 , that the query evaluation problem on pc-tables can be reduced to the
problem of computing the probability of a lineage expression: P(
P( Q [ a/x ] ) . Thus, the
first step towards understanding the complexity of the query evaluation problem is to understand
the complexity of computing P () , for a propositional formula .
We assume in this section that all discrete variables are Boolean variables, and prove that
computing P () is hard.
To compute P () , one could apply directly its definition, Eq. (2.3) , which defines P () =
θ ω() P(θ) , where ω() is the set of satisfying assignments for . But this leads to an algorithm
a
¯
Q)
=
Search WWH ::




Custom Search