Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

A probabilistic database is a probability space D

=

( W ,P) over an incomplete database W ,in

is a function such that W ∈ W P(W) = 1.

other words P

:

W

→ ( 0 , 1 ]

In this topic, we will restrict probabilistic databases to have a finite set of possible worlds,

unless otherwise stated; we will use the finiteness assumption all through this chapter and the next,

but we will briefly look beyond it in Section 6.3 when we discuss Monte-Carlo databases.

Intuitively, in an incomplete database the exact database instance is not known: it can be in one

of n several states, called worlds. In a probabilistic database, we furthermore consider a probability

distribution over the set of worlds. The number of worlds, n , is very large, and we shall describe

shortly some practical ways to represent incomplete and probabilistic databases.

If all instantiations of some relation R j

are the same in all possible worlds of W , i.e., if

R j

R j , then we say that R j is complete or certain or deterministic .

The marginal probability of a tuple t , or the tuple confidence , refers to the probability of the

event t ∈ R j , where R j is one of the relation names of the schema, with

= ··· =

P(W i )

P(t

∈

R j )

=

R j

1

≤

i

≤

n

:

t

∈

2.3

QUERY SEMANTICS

Since a probabilistic database can be in one of many possible states, what does it mean to evaluate a

query Q on such a database? In this section, we define the semantics of a query Q on a probabilistic

database D ; as an intermediate step, we also define the semantics of the query on an incomplete

database W . In each case, we need to consider two possible semantics. In the first, the query is

applied to every possible world, and the result consists of all possible answers (each answer is a set

of tuples); this is called the possible answer sets semantics. This semantics is compositional (we can

apply another query on the result) but is difficult or impossible to present to the user. In the second,

the query is also applied to all possible worlds, but the set of answers are combined, and a single set

of tuples is returned to the user; this is called possible answers semantics. This result can be easily

presented to the user, as a list of tuples, but it is no longer compositional since we lose track of how

tuples are grouped into worlds.

We allow the query to be any function from an input database instance to an output relation:

in other words, for the definitions in this section, we do not need to restrict the query to the relational

calculus.

Throughout this topic, we will assume that the query is defined over a deterministic database.

That is, the user assumes the database is deterministic and formulates the query accordingly, but

the system needs to evaluate it over an incomplete or probabilistic database and therefore returns

all possible sets of answers, or all possible answers. There is no way for the user to inquire about

the probabilities or the different possible worlds of the database; also, the query never introduces

uncertainty, all uncertainty is what existed in the input data. This is a limitation, which is sufficient

Probabilistic Databases

Search WWH ::

Custom Search

Home