Database Reference
In-Depth Information
relations,
R
1
=
σ
A
=
a
(R)
and
R
2
=
σ
A
=
a
(R)
, and substitute in the query
Q
all atoms referring to
R
with
R
1
∨
R
2
. We denote
Q
r
the resulting query and say that
Q
r
is obtained from
Q
by “ranking”
R
on the attribute
A
and the constant
a
.
Thus, the attribute ranking rule is:
P (Q)
=
P(Q
r
)
Attribute ranking
(4.8)
where
Q
r
is obtained by ranking as explained above.
Ranking does not affect the lineage at all; it simply partitions the input tuples, but the set of
tuples remains the same, and the lineage formula is also unchanged. This proves the correctness of
the ranking rule:
If Q
r
Lemma 4.5
is a ranking of Q, then on any probabilistic database D,
Q
=
Q
r
. It follows that
P(Q
r
)
=
P (Q).
Note that
Q
r
is over a different vocabulary than
Q
. We further make the following syntactic
simplifications in
Q
r
. In attribute-attribute ranking, the tuples in
R
2
have
A
B
,sowedropthe
attribute
B
and decrease the arity of
R
2
by 1; also, in
R
3
, where
A>B
, we switch the order of
the attributes
A
and
B
. In attribute-constant ranking, the tuples in
R
2
have
A
=
a
,sowedropthe
attribute
A
and decrease its arity by 1.
We must simplify
Q
r
, to remove subexpressions that become identically
false
or
true
.For
example, consider the query
Q
=
y.R(x,y),R(y,x)
. Ranking by the two attributes of
R
means that we partition
R
into three relations,
R
=
R
1
∪
R
2
∪
R
3
, and obtain nine combina-
tions,
Q
=∃
x.
∃
=
i,j
R
i
(x, y), R
j
(y, x)
. However, only three combinations are non-empty: for example
R
1
(x, y), R
1
(y, x)
≡
false
, because we cannot have both
x<y
and
y<x
. It follows that
Q
=
R
1
(x, y), R
3
(x, y)
∨
R
2
(x)
; as explained, we write
R
2
(x)
instead of
R
2
(x, x)
, and define
R
3
=
yx
σ
x>y
(R(x, y))
instead
of
σ
x>y
(R(x, y))
.
The number of possible ranking steps depends only on the query, not on the database in-
stance. This is because we can rank a pair of attributes only once; once we ranked
R
on
A, B
to
obtain
R
1
,R
2
,R
3
, it makes no more sense to rank
R
1
again on
A, B
, because
σ
A<B
(R
1
)
=
R
1
and
σ
A
=
B
(R
1
)
=
σ
A>B
(R
1
)
=∅
∨
R
2
(x), R
2
(x)
∨
R
3
(y, x), R
1
(y, x)
, which reduces to
R
1
(x, y), R
3
(x, y)
. And, similarly, we can rank an attribute with a constant at most once.
If the query
Q
has
c
constants, and the maximum arity of any relational symbol is
k
, then any query
Q
generated by the rules will have at most
c
+
k
distinct constants, which places an upper bound
on the number of possible attribute-constant rankings. However, we can completely avoid ranking
w.r.t. to the new constants introduced by the independent-project rule as follows. Consider a query
∃
x.Q
where
x
is a separator variable. Before applying the independent-project rule, we check if
there exists an atom
R(...,x,...,x,...)
where
x
occurs in two distinct positions; in that case, we
perform an attribute-attribute ranking w.r.t. to these two attributes (
x
continues to be a separator
variable in the ranked query
∃
x.Q
r
). Repeating this process, we can ensure that
x
occurs only once in