Database Reference
In-Depth Information
relations, R 1 =
σ A = a (R) and R 2 =
σ A = a (R) , and substitute in the query Q all atoms referring to R
with R 1 R 2 . We denote Q r
the resulting query and say that Q r
is obtained from Q by “ranking”
R on the attribute A and the constant a .
Thus, the attribute ranking rule is:
P (Q) = P(Q r )
Attribute ranking
(4.8)
where Q r is obtained by ranking as explained above.
Ranking does not affect the lineage at all; it simply partitions the input tuples, but the set of
tuples remains the same, and the lineage formula is also unchanged. This proves the correctness of
the ranking rule:
If Q r
Lemma 4.5
is a ranking of Q, then on any probabilistic database D, Q = Q r . It follows that
P(Q r ) = P (Q).
Note that Q r is over a different vocabulary than Q . We further make the following syntactic
simplifications in Q r . In attribute-attribute ranking, the tuples in R 2 have A
B ,sowedropthe
attribute B and decrease the arity of R 2 by 1; also, in R 3 , where A>B , we switch the order of
the attributes A and B . In attribute-constant ranking, the tuples in R 2 have A = a ,sowedropthe
attribute A and decrease its arity by 1.
We must simplify Q r , to remove subexpressions that become identically false or true .For
example, consider the query Q
=
y.R(x,y),R(y,x) . Ranking by the two attributes of R
means that we partition R into three relations, R = R 1 R 2 R 3 , and obtain nine combina-
tions, Q
=∃
x.
= i,j R i (x, y), R j (y, x) . However, only three combinations are non-empty: for example
R 1 (x, y), R 1 (y, x)
false , because we cannot have both x<y and y<x . It follows that Q =
R 1 (x, y), R 3 (x, y)
R 2 (x) ; as explained, we write R 2 (x) instead of R 2 (x, x) , and define R 3 = yx σ x>y (R(x, y)) instead
of σ x>y (R(x, y)) .
The number of possible ranking steps depends only on the query, not on the database in-
stance. This is because we can rank a pair of attributes only once; once we ranked R on A, B to
obtain R 1 ,R 2 ,R 3 , it makes no more sense to rank R 1 again on A, B , because σ A<B (R 1 ) = R 1 and
σ A = B (R 1 ) = σ A>B (R 1 ) =∅
R 2 (x), R 2 (x)
R 3 (y, x), R 1 (y, x) , which reduces to R 1 (x, y), R 3 (x, y)
. And, similarly, we can rank an attribute with a constant at most once.
If the query Q has c constants, and the maximum arity of any relational symbol is k , then any query
Q generated by the rules will have at most c + k distinct constants, which places an upper bound
on the number of possible attribute-constant rankings. However, we can completely avoid ranking
w.r.t. to the new constants introduced by the independent-project rule as follows. Consider a query
x.Q where x is a separator variable. Before applying the independent-project rule, we check if
there exists an atom R(...,x,...,x,...) where x occurs in two distinct positions; in that case, we
perform an attribute-attribute ranking w.r.t. to these two attributes ( x continues to be a separator
variable in the ranked query
x.Q r ). Repeating this process, we can ensure that x occurs only once in
Search WWH ::




Custom Search