TheQuery Evaluation Problem - Probabilistic Databases

Database Reference

In-Depth Information

CHAPTER

3

The Query Evaluation Problem

We now turn to the central problem in probabilistic databases: query evaluation. Given a query

Q and a probabilistic database D , evaluate Q on D . We consider the possible answers semantics ,

Definition 2.5 , under which the answer to a query Q is an ordered set of answer-probability pairs,

{

...

Query evaluation is a major challenge for two reasons. On one hand, the problem is provably

hard: computing the output probabilities is hard for #P (a complexity class that we review in this

chapter). On the other hand, database systems are expected to scale, and we cannot restrict their

functionality based on tractability considerations. Users' experience with common databases is that

all queries scale to large data sets, or parallelize to large number of processors, and the same behavior

is expected from probabilistic databases. We discuss in this topic a number of recent advances to

query evaluation on probabilistic databases that brings us closer towards that goal.

The query evaluation problem is formally defined as follows:

(t 1 ,p 1 ), (t 2 ,p 2 ),...

}

, such that p 1 ≥

p 2 ≥

Query Evaluation Problem For a fixed query Q : given a probabilistic database D and possible

answer tuple t , compute its marginal probability P(t

∈

Q) .

In this chapter we show that the query evaluation problem is #P-hard, even if the input database

is a tuple-independent database. The restriction on the input is without loss of generality: query

evaluation remains hard on more general inputs, as long as they allow tuple-independent databases

as a special case. This applies to BID databases, U-databases, and ultimately to pc-tables. This

hardness result sets the bar for probabilistic databases quite high. In the following two chapters we

will describe two approaches to query evaluation, extensional evaluation, and intensional evaluation,

which, together, form a powerful set of techniques for coping with the query evaluation challenge.

3.1

THE COMPLEXITY OF P ()

Recall from Section 2.5 , that the query evaluation problem on pc-tables can be reduced to the

problem of computing the probability of a lineage expression: P(

P( Q [ a/x ] ) . Thus, the

first step towards understanding the complexity of the query evaluation problem is to understand

the complexity of computing P () , for a propositional formula .

We assume in this section that all discrete variables are Boolean variables, and prove that

computing P () is hard.

To compute P () , one could apply directly its definition, Eq. (2.3) , which defines P () =

θ ∈ ω() P(θ) , where ω() is the set of satisfying assignments for . But this leads to an algorithm

a

¯

∈

Q)

=

Probabilistic Databases

Search WWH ::

Custom Search

Home