Database Reference
In-Depth Information
semantics of the query is Q( D )
={
(t 1 ,p 1 ), (t 2 ,p 2 ),...
}
, where t 1 ,t 2 ,... are all possible answers
and p 1 ,p 2 ,... are their marginal probabilities.
This is the semantics that will be our main focus in the next chapter. The intuition behind
it is very simple. On a deterministic database, the query Q returns a set of tuples
{ t 1 ,t 2 ,... }
, while
on a probabilistic database, it returns a set of tuple-probability pairs
. These
answers can be returned to the user in decreasing order of their probabilities, such that p 1 p 2 ...
Notice that while in incomplete databases, we have two variants of the tuple answer semantics,
Q poss and Q cert ; in probabilistic databases, we only have one. The connection between them is given
by the following, where D
{ (t 1 ,p 1 ), (t 2 ,p 2 ),... }
=
( W ,P) :
Q poss ( W )
}
Q cert ( W ) ={ t | (t, p) Q( D ), p = 1 }
={
t
|
(t, p)
Q( D ), p > 0
The possible tuples semantics is not compositional. Once we compute the result of a query
Q( D ) , we can no longer apply a new query Q because Q( D ) is not a probabilistic database: it is
only a collection of tuples and probabilities. However, the possible tuple semantics does compose
with the possible answer sets semantics for views: if V( D ) is a view computed using the possible
answer sets semantics, then we can apply a query Q , under the possible tuples semantics, Q(V ( D )) ;
this is equivalent to computing the query Q V on D under the possible tuples semantics.
2.4
C-TABLES AND PC-TABLES
Definition 2.2 does not suggest a practical representation of incomplete or probabilistic data. Indeed,
the explicit enumeration of all the possible worlds is not feasible when the number of worlds is very
large. To overcome this problem, several representation systems have been proposed, which are
concise ways to describe an incomplete or a probabilistic database. In this section, we define the
most general representation systems: conditional tables or c-tables, for incomplete databases and
probabilistic conditional tables or pc-tables for probabilistic databases.
A c-table is a relation where each tuple is annotated with a propositional formula, called
condition, over random variables. A pc-table further defines a probability space over the assignments
of the random variables. To define them formally, we first need a brief review of discrete variables
and propositional formulas.
Denote by Dom X
the finite domain of a discrete variable X . The event that X takes a value
a
Dom X is denoted by X
=
a and is called an atomic event ,oran atomic formula .If Dom X =
{
true, false
}
, then we say that X is a Boolean variable and write X and
¬ X as shortcuts for the atomic
events X =
false , respectively. Denote by X a finite set of variables X 1 ,X 2 ,...,X n .A
valuation ,or assignment , is a function θ that maps each random variable X
true and X =
X to a value θ(X)
Search WWH ::




Custom Search