Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

2. DATA AND QUERY MODEL

the probability distribution of the Boolean variables, we simply keep the same distribution P as for

D .

Example 2.14 Consider the following pc-database D . Start from the c-tables R, S defined in

Example 2.9 , and let P be the probability given by

P(X 1 ) = P(X 2 ) = P(Y 1 ) = P(Y 2 ) = P(Y 3 ) = 0 . 5

(2.5)

Define D

= ( R, S ,P) to be the pc-database consisting of relations R and S , and the probability

distribution P .

Consider the query Q(x, x ) = R(x), S(x, y), S(x , y), R(x ) . Then, we can represent the

view V

= Q( D ) by the the following pc-table:

a 1

Q(a 1 ,a 1 ) = X 1 Y 1 ∨ X 1 Y 2

a 1

a 2

Q(a 1 ,a 2 ) =

X 1 X 2 Y 1 Y 3

Q(a 2 ,a 1 ) = X 1 X 2 Y 1 Y 3

a 2

a 1

a 2

Q(a 2 ,a 2 ) =

X 2 Y 3

with the same probability distribution of the propositional Boolean variables, given by Eq. (2.5) .

2.7

SIMPLE PROBABILISTIC DATABASE DESIGN

A basic principle in database design theory is that a table with some undesired functional de-

pendencies should be normalized , i.e., decomposed into smaller tables, where only key constraints

hold. The original table can be recovered as a view from the normalized tables. The tradi-

tional motivation for database normalization is to eliminate update anomalies, but decomposi-

tion into simple components is a basic, fundamental design principle, which one should follow

even when updated anomalies are not a top concern. For a simple example of database normal-

ization, consider a schema Document(did, version, title, file) : if the functional depen-

dencies did → title and did,version → file hold, then the table should be normalized into

DocumentTitle(did, title) and DocumentFile(did, version, file) . The original table

can be recovered as:

Document(d, v, t, f) :- DocumentTitle(d, t), DocumentFile(d, v, f)

(2.6)

By decomposing the Document table into the simpler tables DocumentTitle and

DocumentFile , we have removed an undesired constraint, namely the non-key dependency

did → title .

A basic principle in graphical models is that a probability distribution on a large set of ran-

dom variables should be decomposed into factors of simpler probability functions, over small sets of

these variables. These factors can be identified, for example, by using a set of axioms for reasoning

Probabilistic Databases

Search WWH ::

Custom Search

Home