Database Reference
In-Depth Information
A probabilistic database is a probability space D
=
( W ,P) over an incomplete database W ,in
is a function such that W W P(W) = 1.
other words P
:
W
( 0 , 1 ]
In this topic, we will restrict probabilistic databases to have a finite set of possible worlds,
unless otherwise stated; we will use the finiteness assumption all through this chapter and the next,
but we will briefly look beyond it in Section 6.3 when we discuss Monte-Carlo databases.
Intuitively, in an incomplete database the exact database instance is not known: it can be in one
of n several states, called worlds. In a probabilistic database, we furthermore consider a probability
distribution over the set of worlds. The number of worlds, n , is very large, and we shall describe
shortly some practical ways to represent incomplete and probabilistic databases.
If all instantiations of some relation R j
are the same in all possible worlds of W , i.e., if
R j
R j , then we say that R j is complete or certain or deterministic .
The marginal probability of a tuple t , or the tuple confidence , refers to the probability of the
event t R j , where R j is one of the relation names of the schema, with
= ··· =
P(W i )
P(t
R j )
=
R j
1
i
n
:
t
2.3
QUERY SEMANTICS
Since a probabilistic database can be in one of many possible states, what does it mean to evaluate a
query Q on such a database? In this section, we define the semantics of a query Q on a probabilistic
database D ; as an intermediate step, we also define the semantics of the query on an incomplete
database W . In each case, we need to consider two possible semantics. In the first, the query is
applied to every possible world, and the result consists of all possible answers (each answer is a set
of tuples); this is called the possible answer sets semantics. This semantics is compositional (we can
apply another query on the result) but is difficult or impossible to present to the user. In the second,
the query is also applied to all possible worlds, but the set of answers are combined, and a single set
of tuples is returned to the user; this is called possible answers semantics. This result can be easily
presented to the user, as a list of tuples, but it is no longer compositional since we lose track of how
tuples are grouped into worlds.
We allow the query to be any function from an input database instance to an output relation:
in other words, for the definitions in this section, we do not need to restrict the query to the relational
calculus.
Throughout this topic, we will assume that the query is defined over a deterministic database.
That is, the user assumes the database is deterministic and formulates the query accordingly, but
the system needs to evaluate it over an incomplete or probabilistic database and therefore returns
all possible sets of answers, or all possible answers. There is no way for the user to inquire about
the probabilities or the different possible worlds of the database; also, the query never introduces
uncertainty, all uncertainty is what existed in the input data. This is a limitation, which is sufficient
Search WWH ::




Custom Search