Database Reference
In-Depth Information
each atom. In any of the new queries Q r
created by the independent-project rule, the constant
a is not eligible for ranking because if it occurs on some position in one atom R(...,a,...) , then
every R -atom has a on that position.
[
a/x
]
4.1.2.7 General Remarks About the Rules
Every rule starts from a query Q and expresses P (Q) in terms of simpler queries P(Q 1 ) , P(Q 2 ),...
For each of the queries Q i , we apply another rule, and again, until we reach ground tuples, at which
point, we simply look up their probability in the database. This process succeeds only if all branches
of the rewriting end in ground tuples. If the rules succeed in computing the query, then we call it a safe
query ; otherwise, we call it an unsafe query . The rules are non-deterministic: for example, whenever
we can apply an independent join, we can also apply inclusion-exclusion. No matter in what order we
apply the rules; if they terminate, then the result P (Q) is guaranteed to be correct. We will illustrate
several examples of unsafe queries ( Subsection 4.1.3 ) and safe queries ( Subsection 4.1.4 ).
What is the data complexity of computing P (Q) using the rules? Five of the rewrite rules
do not mention the database at all: the independent-project rule is the only rule that depends on the
active domain of the database, and it increases by a factor of n
the number of queries
that need to be evaluated, while reducing by 1 the arity of all relational symbols; therefore, if the
maximum arity of any relational symbol in Q is k , then the data complexity of computing P (Q) is
O(n k ) . Thus, if we can evaluate P (Q) using these rules, then it has polynomial time data complexity.
For queries in the Relational Calculus (RC), safety is not a decidable property. This
follows from Trakthenbrot's theorem, which states that it is undecidable to check whether a
Boolean RC expression is satisfiable over finite models Libkin [ 2004 ]. For example, consider
H 0 =
=|
ADom
|
R(x),S(x,y),T(y) , the query defined in Section 3.2 . Clearly, H 0 is unsafe (we will examine
it closer in the next section). Let Q be an arbitrary query in RC, which does not use the relational
symbols R, S, T ; note that Q may use negation. Then the query H 0
Q is safe iff Q is not satisfiable:
hence, safety is undecidable.
For UCQ queries, safety is decidable because the rules can be applied in a systematic way,
as follows. First, rank all attribute-constant pairs, then repeat the following sequence of steps. (1)
Convert Q from a DNF expression Q 1 Q 2 ... to a CNF expression Q = Q 1 Q 2 ... by
applying the distributivity law. Each query Q i is called a disjunctive query (it is a disjunction of
connected conjunctive queries, see Figure 4.2 ). Apply the independent-join rule ( Eq. (4.1) ), P (Q) =
P(Q i 11
, if possible. (2) Apply the inclusion-exclusion formula,
Eq. (4.7) , to the remaining conjunctions. This results in several disjunctive queries Q j 1 Q j 2 ... :
the number of such disjunctive queries is exponential in the size of the query (of course, it is
independent on the size of the database). (3) Apply independent union Eq. (4.5) , if at all possible (4)
For each remaining disjunction Q
Q i 12 ...)
·
P(Q i 21
Q i 22 ...)
···
... , use Eq. (4.3) to choose a separator variable, x ,
then apply independent project Eq. (4.2) , to obtain new queries of the form Q [ a/x ]
=
Q j 1
Q j 2
; if no separator
variable exists, apply the attribute-attribute ranking rule for some pairs of attributes, and search again
for a separator variable. (5) Each Q [ a/x ]
is expressed in DNF; repeat from step (1). If we ever get
Search WWH ::




Custom Search