Database Reference
In-Depth Information
1
( 1
u 1 .P )
·
( 1
u 2 .P )
···
( 1
u k .P ) :
i
A (R) ={ ( a, 1
( 1
u.P )) | a A (R) }
u R : u. A = a
4.2.1.3 Independent Union
The independent union, in notation R
A S , takes as input two relations R( A 1 ,P),S( A 2 ,P) and
i
returns a relation T( A, P ) , where
A = A 1 A 2 . The deterministic part of T is the set of tuples
a. A 1
a. A 2
(ADom) k
a
A 2 (R) . If a tuple occurs in both R and S , with
probabilities p 1 and p 2 , respectively, then we set its probability in the union to 1
¯
such that
¯
A 1 (R) or
¯
( 1
p 1 )( 1
p 2 ) ;
if it occurs only in R , then we set its probability to p 1 , and similarly if it occurs only in S :
i
a. A 1 ))
a. A 2 )))
a. A 1
a. A 2
R
A S
={
(
a, 1
¯
( 1
P R (
¯
·
( 1
P S (
¯
A 1 (R)
∨¯
A 2 (R)
}
By convention, P T (t)
0 if the tuple does not appear in the table T . We usually subscript the union
on the common attributes A , to indicate the schema. To get the intuition behind the union, consider
first the typical case when the two relations have the same set of attributes
=
A
A 1 =
A 2 , then the
=
i
deterministic part of R
A S is the regular union of R and S .
In general, however, we may have to compute the union of two subexpression with differ-
ent sets of attributes, which is a domain-dependent query 4 . This is a consequence of applying
Möbius inversion rule to non-Boolean queries: for example, if q(x,y) = R(x), S(x), S(y), T (y) ,
then Möbius inversion rule requires us to compute the probability of three queries, one being
q 3 (x, y) = R(x), S(x) S(y),T (y) . This is no longer a domain-independent query: the answer
consists of all pairs (a, b) where either R(a), S(a) is true and b is any value in the domain, or
where S(b), T (b) is true and a is any value. Thus, we must define the independent union R
i
A S
in a domain-dependent way, when A 1 =
A 2 . We start by extending each tuple in R with attributes
A 2 A 1 , and we fill these with all possible values from the active domain, adding all resulting tuples
to the union, similarly for S . In practice, the initial query Q was domain-independent, and then
one can replace this domain-dependent union with a simple outer-join, A 1 (R) d—><—d A 2 (S) :in
other words, we pad the extra attributes with NULLs instead of all values in the domain. This will
not affect the semantics of the query because the query is assumed to be domain independent.
4.2.1.4 Weighted Sum
The weighted sum is an operation parameterized by k integers μ 1 ,...,μ k . It takes as input k
relations R 1 ( A 1 ,P),...,R k ( A k ,P) ; it computes natural join of their deterministic part and sets
the probability of every tuple
a in the result to μ 1 p 1 + ... + μ k p k , where p 1 ,...,p k
are the
4 A relational query is called domain independent if its semantics does not depend on the domain of the database, but it only depends
on the active domain, i.e., the set of constants occurring in the database; otherwise, it is called domain dependent . Classical examples
of domain dependent queries are q(x,y) = R(x) S(y) , and q (x) R(x) . Practical query languages like SQL or Relational
Algebra can express only domain independent queries, by design; in datalog, domain dependence is ensured by requiring that
every head variable occurs in at least one positive atom.
Search WWH ::




Custom Search