Database Reference
In-Depth Information
2. DATA AND QUERY MODEL
The semantics of the c-table
CD
, called
representation
, is the incomplete database
W
={
W
θ
|
θ
∈
}
. Recall that
=
Dom
X
1
×···×
Dom
X
n
is the set of all possible valuations of the variables
X
1
,...,X
n
.
All three c-tables, in
Figure 2.2
,
Figure 2.4
(a) and (b), are illustrations of this definition.
In each case, the table consists of a set of tuples, and each tuple is annotated with a propositional
formula. Notice that we use the term c-table somewhat abusively to denote a “c-database”, consisting
of several tables; we will also refer to a c-database as a “collection of c-tables”.
C-tables can be represented by augmenting a standard table with a column
that stores
the condition associated with each tuple. While in our definition, each tuple must occur at most
once, in practice we sometimes find it convenient to allow a tuple
t
to occur multiple times and be
annotated with different formulas,
1
,
2
,...,
m
: multiple occurrences of
t
are equivalent to a
single occurrence of
t
annotated with the disjunction
1
∨
m
.
We now move to probabilistic databases. A pc-table consists of a c-table plus a probability
distribution
P
over the set
of assignments of the discrete variables
X
1
,...,X
n
, such that all
variables are independent. Thus,
P
is completely specified by the numbers
P(X
=
a)
∈[
...
∨
]
0
,
1
that
assign a probability to each atomic event
X
=
a
such that, for each random variable
X
:
P(X
=
a)
=
1
.
a
∈
Dom
X
The probability of an assignment
θ
∈
is given by the following expression, where
θ(X
i
)
=
a
i
, for
i
=
1
,n
:
P(θ)
=
P(X
1
=
a
1
)
·
P(X
2
=
a
2
)
···
P(X
n
=
a
n
)
(2.2)
The probability of a propositional formula
is:
P ()
=
P(θ)
(2.3)
θ
∈
ω()
where
ω()
the set of satisfying assignments for
.
Definition 2.7
A
probabilistic conditional database
,or
pc-table
for short, is a pair
PCD
=
(CD, P )
where
CD
is a c-table, and
P
is a probability space over the set of assignments.
The semantics of a pc-table is as follows. Its set of possible worlds is the set of possible worlds
of the incomplete database
W
represented by
CD
and the probability of each possible world
W
∈
W
=
θ
∈
:
W
θ
is defined as
P(W)
=
W
P(θ)
.
In practice, both the c-table
CD
and the probability space
P
are stored in standard relations.
CD
is stored by augmenting each tuple with a propositional formula
;
P
is stored in a separate
table
W(V,D,P)
where each row
(X,a,p)
represents the probability of one atomic event,
P(X
=
a)
=
p
. An example of a table
W
is given below: