Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

CHAPTER

4

Extensional Query Evaluation

There are two approaches to query evaluation. In extensional query evaluation, the evaluation process

is guided entirely by the query expression Q . We discuss extensional query evaluation in this chapter.

When extensional evaluation is possible, then the data complexity of Q is in polynomial time, but

we have seen that some queries are provably hard, and therefore not all queries can be evaluated

using the extensional approach. Intensional query evaluation first computes the query's lineage, then

computes the probability of the lineage expression; we discuss it in the next chapter. The intensional

approach reduces query evaluation to the problem of computing the probability of a propositional

formula. Intensional evaluation works for every query, but by losing the information on the query

expression Q , it can perform much worse than extensional query evaluation.

The queries we consider are in the Relational Calculus (RC), or one of its fragments, Unions

of Conjunctive Queries (UCQ) or Conjunctive Queries (CQ), see Section 2.1 . Thus, queries are

built from atomic predicates using the connectives

.

Throughout this chapter, we restrict the input database to be a tuple-independent or BID

database. This will be sufficient for most practical purposes because as we argued in Section 2.7 ,a

probabilistic database should be decomposed into a tuple-independent or BID database. In other

words, if D is an arbitrary probabilistic database, then it should be written as D

∧ , ∨ , ∃ , ¬

= V( D 0 ) , where

D 0 is tuple-independent or BID, and V is a set of relational queries (views). Then, evaluating a

query Q on D reduces to evaluating the composed query Q ◦ V on D 0 . Since relational queries are

closed under query composition, it means that, for all practical purposes, it suffices to study query

evaluation on tuple-independent or BID tables.

We start our discussion of extensional query evaluation in Section 4.1 , by describing six simple

query evaluation rules. Each rule reduces the problem of evaluating Q , to the problem of evaluating

some simpler queries, Q 1 ,...,Q n . Thus, each application of a rule simplifies the query, until it

becomes a ground tuple; in that case, we simply look up its probability in the input database.

This rule-based approach is an example of extensional query evaluation since it computes the query

probability by examining only the query expression, without expanding its lineage.

When the rules succeed in computing the query, we say that the query is safe ; in that case,

the data complexity is in polynomial time. When the rules fail to evaluate the query, then we

call the query unsafe ; as we have seen in Chapter 3 , some queries are provably hard, and these

(unsurprisingly) also turn out to be unsafe. We prove that, for Unions of Conjunctive Queries, the

rules are complete, in the following sense. For any unsafe query, its data complexity is provably hard

for #P. This property is called a dichotomy theorem since every query is either in polynomial time

(when the query is safe) or provably hard for #P (when the query is unsafe). Such a dichotomy is

Search WWH ::

Custom Search

Home