Database Reference
In-Depth Information
CHAPTER
4
Extensional Query Evaluation
There are two approaches to query evaluation. In extensional query evaluation, the evaluation process
is guided entirely by the query expression Q . We discuss extensional query evaluation in this chapter.
When extensional evaluation is possible, then the data complexity of Q is in polynomial time, but
we have seen that some queries are provably hard, and therefore not all queries can be evaluated
using the extensional approach. Intensional query evaluation first computes the query's lineage, then
computes the probability of the lineage expression; we discuss it in the next chapter. The intensional
approach reduces query evaluation to the problem of computing the probability of a propositional
formula. Intensional evaluation works for every query, but by losing the information on the query
expression Q , it can perform much worse than extensional query evaluation.
The queries we consider are in the Relational Calculus (RC), or one of its fragments, Unions
of Conjunctive Queries (UCQ) or Conjunctive Queries (CQ), see Section 2.1 . Thus, queries are
built from atomic predicates using the connectives
.
Throughout this chapter, we restrict the input database to be a tuple-independent or BID
database. This will be sufficient for most practical purposes because as we argued in Section 2.7 ,a
probabilistic database should be decomposed into a tuple-independent or BID database. In other
words, if D is an arbitrary probabilistic database, then it should be written as D
, , , ¬
= V( D 0 ) , where
D 0 is tuple-independent or BID, and V is a set of relational queries (views). Then, evaluating a
query Q on D reduces to evaluating the composed query Q V on D 0 . Since relational queries are
closed under query composition, it means that, for all practical purposes, it suffices to study query
evaluation on tuple-independent or BID tables.
We start our discussion of extensional query evaluation in Section 4.1 , by describing six simple
query evaluation rules. Each rule reduces the problem of evaluating Q , to the problem of evaluating
some simpler queries, Q 1 ,...,Q n . Thus, each application of a rule simplifies the query, until it
becomes a ground tuple; in that case, we simply look up its probability in the input database.
This rule-based approach is an example of extensional query evaluation since it computes the query
probability by examining only the query expression, without expanding its lineage.
When the rules succeed in computing the query, we say that the query is safe ; in that case,
the data complexity is in polynomial time. When the rules fail to evaluate the query, then we
call the query unsafe ; as we have seen in Chapter 3 , some queries are provably hard, and these
(unsurprisingly) also turn out to be unsafe. We prove that, for Unions of Conjunctive Queries, the
rules are complete, in the following sense. For any unsafe query, its data complexity is provably hard
for #P. This property is called a dichotomy theorem since every query is either in polynomial time
(when the query is safe) or provably hard for #P (when the query is unsafe). Such a dichotomy is
Search WWH ::




Custom Search