Database Reference
In-Depth Information
illustrate one such example, by showing how to decompose the c-table in Figure 2.4 (a) into BID
tables.
Extend the c-table R
in Figure 2.4 (a), to a pc-table by defining the following
Example 2.19
probabilities:
P(X = 1 ) = 0 . 2
P(X = 2 ) = 0 . 8
P(Y
=
1 )
=
0 . 3
P(X
=
2 )
=
0 . 7
P(Z =
1 ) =
0 . 5
P(Z =
2 ) =
0 . 5
With some abuse of notation, we also call the pc-table R . This table has undesired constraints
between tuples because two tuples are disjoint, if they have either the same FID or the same
SSN . Here is a better design: decompose this table into two BID tables, T( FID , SSN, N, P) and
S( SSN , FID, P) . Here, T represents the independent choices for interpreting the hand-written cen-
sus forms in Figure 2.1 : this is a BID table. S represents the independent choices of assigning each
SSN uniquely to one person (or, more concretely, to one census form): this is also a BID table. Both
BID tables S and T are shown in Figure 2.8 . The original table R can be reconstructed as:
R (fid,ssn,n) :- S(ssn, f id), T ( f id ,ssn,n).
Thus, in this new representation, there are no more hidden constraints in S and T since these
are BID tables. However, R is not a BID table at all! Tuples with the same FID are disjoint, and so
are tuples with the same SSN, but the tuples can no longer be partitioned into independent blocks
of disjoint tuples; in fact, any two of the four possible tuples in Figure 2.4 (a) are correlated.
At the risk of reiterating the obvious, we note that the instances for T and S can become very
large, yet their probabilistic model is very simple (BID); on the other hand, the instance R has a
complicated probabilistic model, but it is derived from the simple tables T and S by using the simple
query above.
The example suggests that in many applications, the probabilistic database, even if it has
a complex probability space, can be decomposed naturally into BID tables. But some probabilis-
tic databases do not seem to have a natural decomposition. Consider, for example, the c-table in
Figure 2.4 (b) (extended with some arbitrary probability distribution for the discrete variables X and
Y ). It seems difficult to find a natural decomposition of R into BID tables. Of course, we can apply
the proof of Proposition 2.18 , but this requires us to define a BID table that has one tuple for every
possible world, which is not practical. Better designs are still possible, but it is unclear how practical
they are for this particular example.
2.7.3 U-DATABASES
Neither tuple-independent nor BID databases are closed under queries in the Relational Calculus. U-
relations are a convenient representation formalism, which allows us to express naturally the result
 
Search WWH ::




Custom Search