Database Reference
In-Depth Information
2. DATA AND QUERY MODEL
the probability distribution of the Boolean variables, we simply keep the same distribution P as for
D .
Example 2.14 Consider the following pc-database D . Start from the c-tables R, S defined in
Example 2.9 , and let P be the probability given by
P(X 1 ) = P(X 2 ) = P(Y 1 ) = P(Y 2 ) = P(Y 3 ) = 0 . 5
(2.5)
Define D
= ( R, S ,P) to be the pc-database consisting of relations R and S , and the probability
distribution P .
Consider the query Q(x, x ) = R(x), S(x, y), S(x , y), R(x ) . Then, we can represent the
view V
= Q( D ) by the the following pc-table:
xx
a 1
Q
a 1
Q(a 1 ,a 1 ) = X 1 Y 1 X 1 Y 2
a 1
a 2
Q(a 1 ,a 2 ) =
X 1 X 2 Y 1 Y 3
Q(a 2 ,a 1 ) = X 1 X 2 Y 1 Y 3
a 2
a 1
a 2
a 2
Q(a 2 ,a 2 ) =
X 2 Y 3
with the same probability distribution of the propositional Boolean variables, given by Eq. (2.5) .
2.7
SIMPLE PROBABILISTIC DATABASE DESIGN
A basic principle in database design theory is that a table with some undesired functional de-
pendencies should be normalized , i.e., decomposed into smaller tables, where only key constraints
hold. The original table can be recovered as a view from the normalized tables. The tradi-
tional motivation for database normalization is to eliminate update anomalies, but decomposi-
tion into simple components is a basic, fundamental design principle, which one should follow
even when updated anomalies are not a top concern. For a simple example of database normal-
ization, consider a schema Document(did, version, title, file) : if the functional depen-
dencies did title and did,version file hold, then the table should be normalized into
DocumentTitle(did, title) and DocumentFile(did, version, file) . The original table
can be recovered as:
Document(d, v, t, f) :- DocumentTitle(d, t), DocumentFile(d, v, f)
(2.6)
By decomposing the Document table into the simpler tables DocumentTitle and
DocumentFile , we have removed an undesired constraint, namely the non-key dependency
did title .
A basic principle in graphical models is that a probability distribution on a large set of ran-
dom variables should be decomposed into factors of simpler probability functions, over small sets of
these variables. These factors can be identified, for example, by using a set of axioms for reasoning
 
Search WWH ::




Custom Search