Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

4.3.3 KEYS IN THE REPRESENTATION

It is common to have a key attribute in the representation of a tuple-independent or a BID table. For

example, consider a tuple-independent relation S(x,y) , and assume that in its representation the

attribute x is a key. This means that the tuples continue to be independent, but that the x attribute

is distinct among all possible tuples.

In general, if x is a key in the representation of a table, then it is also a key in all possible

worlds since a possible world is just some subset of the representation; the table is like a BID table

where every block is uniquely identified by x and contains a single tuple. But the converse does not

hold: if x is key in every possible world, then it means that S(x,y) is a BID-table, and it does not

imply that x is a key in the representation.

If we know which attributes are keys in the representation, then we may be able to find safe

plans for queries that otherwise would be intractable. More generally, assume an arbitrary set of

functional dependencies. For any conjunctive query without self-joins, one can check if it is tractable

using the following process, similar to a chase. We repeat until no more change is possible: if a

functional dependency X → y holds, where X is a set of variables, then add the variable y to all

other atoms that contain all variables in X . Once the chase terminates, check if the resulting query

is safe: the original query is tractable iff the chased query is tractable.

For example, assume that x

=

R(x),S(x,y),T(y) : this is the same as H 0 , but we denote it differently to emphasize that it is over

a table S(x,y) where x is a key attribute. Then the chase process described above rewrites Q to

Q = R(x,y),S(x,y),T (y) , which is a safe query. In other words, if we know that x isakeyin

the representation of S , then the query becomes tractable. It is known that the chase procedure is

complete for conjunctive queries without self-joins: it is unknown whether it is complete, or even

how to extend it to a richer class of queries.

→

y holds in S(x,y) , and consider the Boolean query Q

4.4

BIBLIOGRAPHIC AND HISTORICAL NOTES

Safe queries and safe query plans are introduced by Dalvi and Suciu [ 2004 ]. They also prove a

dichotomy into polynomial time and #P-hard for conjunctive queries without self-joins over tuple-

independent tables. The evaluation algorithm for UCQ queries that we discussed in Section 4.1

was first introduced by Dalvi et al. [ 2010 ]. They introduced the inclusion-exclusion rule and its

extension to Möbius inversion formula, and also announced the dichotomy result in Theorem 4.23 .

The complete proof of this theorem is in [ Dalvi and Suciu , 2010 ].

The safe plans discussed in Section 4.2 have an important practical limitation: they restrict

severely the choices of the query optimizers. Olteanu et al. [ 2009 ] address this problem, by decou-

pling the data processing part of the query plan from the probabilistic inference part. They introduce

a new type of plan that allows the optimizer to choose the best plan for the data processing part, yet

allowing the probabilistic inference to take advantage of the query's safety. In that framework, a safe

plan is an eager plan, where all probabilistic computations are performed as early as possible; lazy

plans are at the other extreme as they compute the probabilities after the result tuples are computed.

Search WWH ::

Custom Search

Home