Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

relations, R 1 =

σ A = a (R) and R 2 =

σ A = a (R) , and substitute in the query Q all atoms referring to R

with R 1 ∨ R 2 . We denote Q r

the resulting query and say that Q r

is obtained from Q by “ranking”

R on the attribute A and the constant a .

Thus, the attribute ranking rule is:

P (Q) = P(Q r )

Attribute ranking

(4.8)

where Q r is obtained by ranking as explained above.

Ranking does not affect the lineage at all; it simply partitions the input tuples, but the set of

tuples remains the same, and the lineage formula is also unchanged. This proves the correctness of

the ranking rule:

If Q r

Lemma 4.5

is a ranking of Q, then on any probabilistic database D, Q = Q r . It follows that

P(Q r ) = P (Q).

Note that Q r is over a different vocabulary than Q . We further make the following syntactic

simplifications in Q r . In attribute-attribute ranking, the tuples in R 2 have A

B ,sowedropthe

attribute B and decrease the arity of R 2 by 1; also, in R 3 , where A>B , we switch the order of

the attributes A and B . In attribute-constant ranking, the tuples in R 2 have A = a ,sowedropthe

attribute A and decrease its arity by 1.

We must simplify Q r , to remove subexpressions that become identically false or true .For

example, consider the query Q

y.R(x,y),R(y,x) . Ranking by the two attributes of R

means that we partition R into three relations, R = R 1 ∪ R 2 ∪ R 3 , and obtain nine combina-

tions, Q

=∃

∃

= i,j R i (x, y), R j (y, x) . However, only three combinations are non-empty: for example

R 1 (x, y), R 1 (y, x) ≡

false , because we cannot have both x<y and y<x . It follows that Q =

R 1 (x, y), R 3 (x, y)

∨

R 2 (x) ; as explained, we write R 2 (x) instead of R 2 (x, x) , and define R 3 = yx σ x>y (R(x, y)) instead

of σ x>y (R(x, y)) .

The number of possible ranking steps depends only on the query, not on the database in-

stance. This is because we can rank a pair of attributes only once; once we ranked R on A, B to

obtain R 1 ,R 2 ,R 3 , it makes no more sense to rank R 1 again on A, B , because σ A<B (R 1 ) = R 1 and

σ A = B (R 1 ) = σ A>B (R 1 ) =∅

∨

R 2 (x), R 2 (x)

∨

R 3 (y, x), R 1 (y, x) , which reduces to R 1 (x, y), R 3 (x, y)

. And, similarly, we can rank an attribute with a constant at most once.

If the query Q has c constants, and the maximum arity of any relational symbol is k , then any query

Q generated by the rules will have at most c + k distinct constants, which places an upper bound

on the number of possible attribute-constant rankings. However, we can completely avoid ranking

w.r.t. to the new constants introduced by the independent-project rule as follows. Consider a query

∃

x.Q where x is a separator variable. Before applying the independent-project rule, we check if

there exists an atom R(...,x,...,x,...) where x occurs in two distinct positions; in that case, we

perform an attribute-attribute ranking w.r.t. to these two attributes ( x continues to be a separator

variable in the ranked query

∃ x.Q r ). Repeating this process, we can ensure that x occurs only once in

Probabilistic Databases

Search WWH ::

Custom Search

Home