Database Reference
In-Depth Information
4. EXTENSIONAL QUERY EVALUATION
not to be expected from a theoretical point of view - for example, it is known that unless P
NP ,
there are problems strictly harder than P and not NP-hard (and thus not #P-hard). The dichotomy
theorem shows that none of these intermediate problems are UCQ query evaluation problems on
tuple-independent probabilistic databases. An important aspect of the dichotomy theorem is that the
distinction between polynomial-time and #P-hard queries is done solely on the query expression, i.e.,
it is a property that can be decided based on the syntax. Thus, in the case of UCQ, the partition into
polynomial time and #P-hard is done solely based on the syntax. For the full Relational Calculus, it
is not known whether a similar dichotomy holds. Even checking if a query is safe is now undecidable
because for the Relational Calculus checking satisfiability is undecidable [ Libkin , 2004 ].
In some sense, extensional query evaluation is strictly more powerful than intensional evalu-
ation, due to a new and powerful rule for evaluation, inclusion-exclusion . This rule is very effective
when applied to a query expression, but it is not practical at all on propositional formulas because
it has exponential complexity. When applied to a query, the exponential complexity is restricted to
the query expression only, and the complexity remains polynomial time in the size of the data. But
when applied to a propositional formula, the inclusion-exclusion rule is exponential in the size of the
formula, which, in turn, depends on the size of the database. In fact, as we will see in the next chapter,
inclusion-exclusion is not part of the standard set of techniques developed by the verification or AI
communities for computing the probability of a propositional formula.
Next, we turn in Section 4.2 to a practical problem: how to evaluate queries by reusing stan-
dard relational techniques, query operators and query plans. An extensional operator is a standard
relational operator (e.g., join, projection, union, selection, difference), extended to manipulate tu-
ple probabilities explicitly. For example, a join will multiply the probabilities, etc.; typically, each
extensional operator makes some assumptions on the input tuples, e.g., that they are independent,
or that they are disjoint. An extensional plan is a query plan where each operator is an extensional
operator. A safe query plan is an extensional plan that, furthermore, is guaranteed to compute all
output probabilities correctly. The rules discussed in the previous section can be adapted to compute
safe plans, and, in particular, a query is safe iff it admits a safe plan. Thus, if a query is safe, not only
can we evaluate it efficiently “in theory”, but we can actually push down the entire query evaluation
in a relational database engine, and achieve real scalability, or (if needed) parallelize its execution.
On the other hand, unsafe queries do not have any safe plans. However, in practice, we can always
use an extensional plan to compute any unsafe query and obtain some approximate probabilities. We
show that, under some restrictions, the probabilities returned by any extensional plan are guaranteed
upper bounds of the correct probabilities.
Finally, in Section 4.3 , we consider several extensions to the query evaluation problem: to BID
tables, to databases where some tables are known to be deterministic, and to representations with
key constraints.
=
Search WWH ::




Custom Search