Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

−

( 1

−

u 1 .P )

( 1

−

u 2 .P )

···

( 1

−

u k .P ) :

A (R) ={ ( a, 1

−

( 1

− u.P )) | a ∈ A (R) }

u ∈ R : u. A = a

4.2.1.3 Independent Union

The independent union, in notation R ∪

A S , takes as input two relations R( A 1 ,P),S( A 2 ,P) and

returns a relation T( A, P ) , where

A = A 1 ∪ A 2 . The deterministic part of T is the set of tuples

a. A 1 ∈

a. A 2 ∈

(ADom) k

A 2 (R) . If a tuple occurs in both R and S , with

probabilities p 1 and p 2 , respectively, then we set its probability in the union to 1

∈

such that

A 1 (R) or

− ( 1

− p 1 )( 1

− p 2 ) ;

if it occurs only in R , then we set its probability to p 1 , and similarly if it occurs only in S :

a. A 1 ))

a. A 2 )))

a. A 1 ∈

a. A 2 ∈

∪

A S

(

a, 1

−

( 1

−

P R (

( 1

−

P S (

|¯

A 1 (R)

∨¯

A 2 (R)

}

By convention, P T (t)

0 if the tuple does not appear in the table T . We usually subscript the union

on the common attributes A , to indicate the schema. To get the intuition behind the union, consider

first the typical case when the two relations have the same set of attributes

A 1 =

A 2 , then the

deterministic part of R ∪

A S is the regular union of R and S .

In general, however, we may have to compute the union of two subexpression with differ-

ent sets of attributes, which is a domain-dependent query 4 . This is a consequence of applying

Möbius inversion rule to non-Boolean queries: for example, if q(x,y) = R(x), S(x), S(y), T (y) ,

then Möbius inversion rule requires us to compute the probability of three queries, one being

q 3 (x, y) = R(x), S(x) ∨ S(y),T (y) . This is no longer a domain-independent query: the answer

consists of all pairs (a, b) where either R(a), S(a) is true and b is any value in the domain, or

where S(b), T (b) is true and a is any value. Thus, we must define the independent union R ∪

A S

in a domain-dependent way, when A 1 =

A 2 . We start by extending each tuple in R with attributes

A 2 − A 1 , and we fill these with all possible values from the active domain, adding all resulting tuples

to the union, similarly for S . In practice, the initial query Q was domain-independent, and then

one can replace this domain-dependent union with a simple outer-join, A 1 (R) d—><—d A 2 (S) :in

other words, we pad the extra attributes with NULLs instead of all values in the domain. This will

not affect the semantics of the query because the query is assumed to be domain independent.

4.2.1.4 Weighted Sum

The weighted sum is an operation parameterized by k integers μ 1 ,...,μ k . It takes as input k

relations R 1 ( A 1 ,P),...,R k ( A k ,P) ; it computes natural join of their deterministic part and sets

the probability of every tuple

a in the result to μ 1 p 1 + ... + μ k p k , where p 1 ,...,p k

are the

4 A relational query is called domain independent if its semantics does not depend on the domain of the database, but it only depends

on the active domain, i.e., the set of constants occurring in the database; otherwise, it is called domain dependent . Classical examples

of domain dependent queries are q(x,y) = R(x) ∨ S(y) , and q (x) =¬ R(x) . Practical query languages like SQL or Relational

Algebra can express only domain independent queries, by design; in datalog, domain dependence is ensured by requiring that

every head variable occurs in at least one positive atom.

Probabilistic Databases

Search WWH ::

Custom Search

Home