Database Reference
In-Depth Information
1.Let D be a probabilis-
tic database with n + 1 possible worlds, R 1 ,...,R n ,R n + 1 , and denote by p 1 ,...,p n ,p n + 1 their
probabilities. For i
2. DATA AND QUERY MODEL
Assuming the statement is true for n , we will prove it for n
+
p n + 1 ) . Since i = 1 ,n + 1 p i =
1 ,...,n , let 3
=
q i =
p i /( 1
1, it follows that
i = 1 ,n q i =
1. Consider the probabilistic database D consisting of the first n worlds R 1 ,...,R n ,
with probabilities q 1 ,...,q n . By induction hypothesis, there exists a tuple-independent database
ID n = ( S n ,W n ,P n ) , s.t. W n ={ k 1 ,...,k n }
= Q n ( ID n ) .
Let k n + 1 be a new constant (not occurring in W n ): define S n + 1 = S n ∪{ k n + 1 R n + 1
, and there exists a query Q n s.t. D
and
W n + 1 = W n ∪{ k n + 1 }
. Define the probabilities of its tuples as P n + 1 (k i ) = P n (k i ) for i n and
P n + 1 (k n + 1 ) = p n + 1 . Define Q n + 1 to be the following 4 :
A K = k n + 1 (S))
if k n + 1 W
Q n + 1 =
Q n (S, W )
if k n + 1 W
In other words, Q n + 1 takes as input a possible world
S,W
, where S = S n + 1 and W
W n + 1 , and
W then it returns R n + 1 (this is the first case), and if k n + 1
does the following: if k n + 1
W then
it computes the query Q n . To see why this is correct, notice that R n + 1 is returned with probability
p n + 1 . The second case holds with probability 1
p n + 1 : by induction hypothesis, Q n returns each
R i
with probability q i , and therefore Q n + 1 returns R i
with probability q i ( 1 p n + 1 ) = p i .
Thus, tuple-independent tables, coupled with views, form a complete representation system.
However, the construction in the proof is impractical, for two reasons. First, we used in the decom-
position a different tuple for each possible world in D ; in general, D has a huge number of tuples, and
for that reason, the construction in the proof is impractical. Second, the query itself ( Q n ) depends
on the number of worlds in D . Proposition 2.16 is of theoretical interest only: we know that it is
possible to decompose any probabilistic database into components that are tuple-independent, but
it is unclear if we always want to do so. Sometimes, it is more natural to decompose the database
into BID components, which we discuss next.
We end our discussion of tuple-independent probabilistic tables by showing that one needs
the full power of Relational Calculus (RC) in Proposition 2.16 .
Proposition 2.17 Tuple-independent tables extended with UCQviews are not a complete representation
system.
Proof. We need a definition. We say that an incomplete database W has a maximal element if there
exists a world W
W
W , W
W . It is easy to see that, for
any tuple-independent probabilistic database, its incomplete database (obtained by discarding the
probabilities) has a maximal element. Indeed, since all tuples are independent, we can simply include
all of them and create a maximal world. Recall that every query Q in UCQ is monotone , meaning that,
for any two database instances W 1 W 2 , we have Q(W 1 ) Q(W 2 ) : this holds because UCQ does
3 These values can be interpreted as conditional probabilities, q i = P(R i
W that contains all other worlds:
R n + 1 ) .
4 The formal expression for Q n + 1 is
K = k n + 1 (W )) × A K = k n + 1 (S)) Q n .
Search WWH ::




Custom Search