Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

1.Let D be a probabilis-

tic database with n + 1 possible worlds, R 1 ,...,R n ,R n + 1 , and denote by p 1 ,...,p n ,p n + 1 their

probabilities. For i

2. DATA AND QUERY MODEL

Assuming the statement is true for n , we will prove it for n

p n + 1 ) . Since i = 1 ,n + 1 p i =

1 ,...,n , let 3

q i =

p i /( 1

−

1, it follows that

i = 1 ,n q i =

1. Consider the probabilistic database D consisting of the first n worlds R 1 ,...,R n ,

with probabilities q 1 ,...,q n . By induction hypothesis, there exists a tuple-independent database

ID n = ( S n ,W n ,P n ) , s.t. W n ={ k 1 ,...,k n }

= Q n ( ID n ) .

Let k n + 1 be a new constant (not occurring in W n ): define S n + 1 = S n ∪{ k n + 1 }× R n + 1

, and there exists a query Q n s.t. D

and

W n + 1 = W n ∪{ k n + 1 }

. Define the probabilities of its tuples as P n + 1 (k i ) = P n (k i ) for i ≤ n and

P n + 1 (k n + 1 ) = p n + 1 . Define Q n + 1 to be the following 4 :

A (σ K = k n + 1 (S))

if k n + 1 ∈ W

Q n + 1 =

Q n (S, W )

if k n + 1 ∈ W

In other words, Q n + 1 takes as input a possible world

S,W

, where S = S n + 1 and W

⊆ W n + 1 , and

W then it returns R n + 1 (this is the first case), and if k n + 1 ∈

does the following: if k n + 1 ∈

W then

it computes the query Q n . To see why this is correct, notice that R n + 1 is returned with probability

p n + 1 . The second case holds with probability 1

−

p n + 1 : by induction hypothesis, Q n returns each

R i

with probability q i , and therefore Q n + 1 returns R i

with probability q i ( 1 − p n + 1 ) = p i .

Thus, tuple-independent tables, coupled with views, form a complete representation system.

However, the construction in the proof is impractical, for two reasons. First, we used in the decom-

position a different tuple for each possible world in D ; in general, D has a huge number of tuples, and

for that reason, the construction in the proof is impractical. Second, the query itself ( Q n ) depends

on the number of worlds in D . Proposition 2.16 is of theoretical interest only: we know that it is

possible to decompose any probabilistic database into components that are tuple-independent, but

it is unclear if we always want to do so. Sometimes, it is more natural to decompose the database

into BID components, which we discuss next.

We end our discussion of tuple-independent probabilistic tables by showing that one needs

the full power of Relational Calculus (RC) in Proposition 2.16 .

Proposition 2.17 Tuple-independent tables extended with UCQviews are not a complete representation

system.

Proof. We need a definition. We say that an incomplete database W has a maximal element if there

exists a world W

W ∈

W , W ⊆

W . It is easy to see that, for

any tuple-independent probabilistic database, its incomplete database (obtained by discarding the

probabilities) has a maximal element. Indeed, since all tuples are independent, we can simply include

all of them and create a maximal world. Recall that every query Q in UCQ is monotone , meaning that,

for any two database instances W 1 ⊆ W 2 , we have Q(W 1 ) ⊆ Q(W 2 ) : this holds because UCQ does

3 These values can be interpreted as conditional probabilities, q i = P(R i

∈

W that contains all other worlds:

∀

|¬ R n + 1 ) .

4 The formal expression for Q n + 1 is

(σ K = k n + 1 (W )) × A (σ K = k n + 1 (S)) ∪ Q n .

∅

Probabilistic Databases

Search WWH ::

Custom Search

Home