Database Reference
In-Depth Information
2. DATA AND QUERY MODEL
The semantics of the c-table CD , called representation , is the incomplete database W
={ W θ
|
θ }
. Recall that =
Dom X 1 ×···×
Dom X n
is the set of all possible valuations of the variables
X 1 ,...,X n .
All three c-tables, in Figure 2.2 , Figure 2.4 (a) and (b), are illustrations of this definition.
In each case, the table consists of a set of tuples, and each tuple is annotated with a propositional
formula. Notice that we use the term c-table somewhat abusively to denote a “c-database”, consisting
of several tables; we will also refer to a c-database as a “collection of c-tables”.
C-tables can be represented by augmenting a standard table with a column that stores
the condition associated with each tuple. While in our definition, each tuple must occur at most
once, in practice we sometimes find it convenient to allow a tuple t to occur multiple times and be
annotated with different formulas, 1 , 2 ,..., m : multiple occurrences of t are equivalent to a
single occurrence of t annotated with the disjunction 1
m .
We now move to probabilistic databases. A pc-table consists of a c-table plus a probability
distribution P over the set of assignments of the discrete variables X 1 ,...,X n , such that all
variables are independent. Thus, P is completely specified by the numbers P(X = a) ∈[
...
]
0 , 1
that
assign a probability to each atomic event X
=
a such that, for each random variable X :
P(X = a) = 1 .
a
Dom X
The probability of an assignment θ is given by the following expression, where θ(X i ) = a i , for
i =
1 ,n :
P(θ) = P(X 1 = a 1 ) · P(X 2 = a 2 ) ··· P(X n = a n )
(2.2)
The probability of a propositional formula is:
P () =
P(θ)
(2.3)
θ ω()
where ω() the set of satisfying assignments for .
Definition 2.7 A probabilistic conditional database ,or pc-table for short, is a pair PCD = (CD, P )
where CD is a c-table, and P is a probability space over the set of assignments.
The semantics of a pc-table is as follows. Its set of possible worlds is the set of possible worlds
of the incomplete database W represented by CD and the probability of each possible world W
W
= θ : W θ
is defined as P(W)
= W P(θ) .
In practice, both the c-table CD and the probability space P are stored in standard relations.
CD is stored by augmenting each tuple with a propositional formula ; P is stored in a separate
table W(V,D,P) where each row (X,a,p) represents the probability of one atomic event, P(X =
a) = p . An example of a table W is given below:
Search WWH ::




Custom Search