Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

4. EXTENSIONAL QUERY EVALUATION

not to be expected from a theoretical point of view - for example, it is known that unless P

NP ,

there are problems strictly harder than P and not NP-hard (and thus not #P-hard). The dichotomy

theorem shows that none of these intermediate problems are UCQ query evaluation problems on

tuple-independent probabilistic databases. An important aspect of the dichotomy theorem is that the

distinction between polynomial-time and #P-hard queries is done solely on the query expression, i.e.,

it is a property that can be decided based on the syntax. Thus, in the case of UCQ, the partition into

polynomial time and #P-hard is done solely based on the syntax. For the full Relational Calculus, it

is not known whether a similar dichotomy holds. Even checking if a query is safe is now undecidable

because for the Relational Calculus checking satisfiability is undecidable [ Libkin , 2004 ].

In some sense, extensional query evaluation is strictly more powerful than intensional evalu-

ation, due to a new and powerful rule for evaluation, inclusion-exclusion . This rule is very effective

when applied to a query expression, but it is not practical at all on propositional formulas because

it has exponential complexity. When applied to a query, the exponential complexity is restricted to

the query expression only, and the complexity remains polynomial time in the size of the data. But

when applied to a propositional formula, the inclusion-exclusion rule is exponential in the size of the

formula, which, in turn, depends on the size of the database. In fact, as we will see in the next chapter,

inclusion-exclusion is not part of the standard set of techniques developed by the verification or AI

communities for computing the probability of a propositional formula.

Next, we turn in Section 4.2 to a practical problem: how to evaluate queries by reusing stan-

dard relational techniques, query operators and query plans. An extensional operator is a standard

relational operator (e.g., join, projection, union, selection, difference), extended to manipulate tu-

ple probabilities explicitly. For example, a join will multiply the probabilities, etc.; typically, each

extensional operator makes some assumptions on the input tuples, e.g., that they are independent,

or that they are disjoint. An extensional plan is a query plan where each operator is an extensional

operator. A safe query plan is an extensional plan that, furthermore, is guaranteed to compute all

output probabilities correctly. The rules discussed in the previous section can be adapted to compute

safe plans, and, in particular, a query is safe iff it admits a safe plan. Thus, if a query is safe, not only

can we evaluate it efficiently “in theory”, but we can actually push down the entire query evaluation

in a relational database engine, and achieve real scalability, or (if needed) parallelize its execution.

On the other hand, unsafe queries do not have any safe plans. However, in practice, we can always

use an extensional plan to compute any unsafe query and obtain some approximate probabilities. We

show that, under some restrictions, the probabilities returned by any extensional plan are guaranteed

upper bounds of the correct probabilities.

Finally, in Section 4.3 , we consider several extensions to the query evaluation problem: to BID

tables, to databases where some tables are known to be deterministic, and to representations with

key constraints.

=

Search WWH ::

Custom Search

Home