Database Reference
In-Depth Information
2. DATA AND QUERY MODEL
the probability distribution of the Boolean variables, we simply keep the same distribution
P
as for
D
.
Example 2.14
Consider the following pc-database
D
. Start from the c-tables
R, S
defined in
Example 2.9
, and let
P
be the probability given by
P(X
1
)
=
P(X
2
)
=
P(Y
1
)
=
P(Y
2
)
=
P(Y
3
)
=
0
.
5
(2.5)
Define
D
=
(
R, S
,P)
to be the pc-database consisting of relations
R
and
S
, and the probability
distribution
P
.
Consider the query
Q(x, x
)
=
R(x), S(x, y), S(x
, y), R(x
)
. Then, we can represent the
view
V
=
Q(
D
)
by the the following pc-table:
xx
a
1
Q
a
1
Q(a
1
,a
1
)
=
X
1
Y
1
∨
X
1
Y
2
a
1
a
2
Q(a
1
,a
2
)
=
X
1
X
2
Y
1
Y
3
Q(a
2
,a
1
)
=
X
1
X
2
Y
1
Y
3
a
2
a
1
a
2
a
2
Q(a
2
,a
2
)
=
X
2
Y
3
with the same probability distribution of the propositional Boolean variables, given by
Eq. (2.5)
.
2.7
SIMPLE PROBABILISTIC DATABASE DESIGN
A basic principle in database design theory is that a table with some undesired functional de-
pendencies should be
normalized
, i.e., decomposed into smaller tables, where only key constraints
hold. The original table can be recovered as a view from the normalized tables. The tradi-
tional motivation for database normalization is to eliminate update anomalies, but decomposi-
tion into simple components is a basic, fundamental design principle, which one should follow
even when updated anomalies are not a top concern. For a simple example of database normal-
ization, consider a schema
Document(did, version, title, file)
: if the functional depen-
dencies
did
→
title
and
did,version
→
file
hold, then the table should be normalized into
DocumentTitle(did, title)
and
DocumentFile(did, version, file)
. The original table
can be recovered as:
Document(d, v, t, f)
:-
DocumentTitle(d, t), DocumentFile(d, v, f)
(2.6)
By decomposing the
Document
table into the simpler tables
DocumentTitle
and
DocumentFile
, we have removed an undesired constraint, namely the non-key dependency
did
→
title
.
A basic principle in graphical models is that a probability distribution on a large set of ran-
dom variables should be decomposed into factors of simpler probability functions, over small sets of
these variables. These factors can be identified, for example, by using a set of axioms for reasoning