Extensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

probabilities of

a. A k in relations R 1 ,...,R k , respectively. We will apply the weighted

sum only as an instance of Möbius inversion formula, and in that case, the sum μ 1 p 1 + ... + μ k p k

is guaranteed to be a number in

a. A 1 ,...,

[ 0 , 1 ]

. We superscript the weighted sum with the weights and

A :

subscript it with the attribute names

(R 1 ,...,R k ) ={ ( a,

i = 1 ,k

μ 1 ,...,μ k

μ i P R i ( a. A i )) | a ∈ A 1 (R 1 )

...

A k (R k ) }

For example, consider the query q(x,y) = R(x), S(x), S(y), T (y) . We apply the weighted

sum to three subqueries: q 1 (x) = R(x), S(x) , q 2 (y) = S(y),T (y) and q 3 (x, y) = R(x), S(x) ∨

S(y),T (y) , in notation q(x,y)

1 , 1 , − 1

x,y (q 1 (x), q 2 (y), q 3 (x, y)) . If we compute q 3 (x, y) naively,

using the entire active domain, then the weighted sum is simply the natural join of the three sub-

queries. If we compute q 3 (x, y) as an outerjoin, then we need to interpret a NULL in the x or in the

y position as standing for all values in the active domain: the result of the weighted sum consists of

all tuples (a, b) such that a is in q 1 (x) , b is in q 2 (y) , and either (a, b) ,or (a, NULL) ,or (NULL,b)

is in q 3 (x, y) .

4.2.1.5 Complementation

The complement of a probabilistic relation R( A, P ) with k deterministic attributes is:

(ADom(D)) k

C A (R)

(

a, 1

−

P R (

a))

|¯

∈

}

In practice, since the query is domain independent, we never need to compute the complement

w.r.t. to the entire active domain, but only w.r.t. to some other relation. In that case, every complement

operation can be replaced with the difference of two relations R( A, P ) and S( A, P ) , which is

i S

−

A C A (S) .

4.2.1.6 Selection

The selection operator is the same as on deterministic tables:

σ C (R) ={ ( a, P R ( a)) | C |= ¯ a }

It is needed both to compute selections that occur naturally in the query and to implement

attribute ranking. For example, if we rank the attributes A, B of R , then we split R into three tables,

by using selection: R 1 = σ A<B (R) , R 2 = σ A = B (R) , and R 3 = σ A>B (R) .

Example 4.30 We illustrate the query plans for several (non-Boolean) queries in Figure 4.3 ,

Figure 4.4 , Figure 4.5 , and Figure 4.6 . The figures make the following conventions. All leaves show

the regular attribute names, omitting the probability attribute: for example, the relation S(x,y) in

Probabilistic Databases

Search WWH ::

Custom Search

Home