Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

semantics of the query is Q( D )

(t 1 ,p 1 ), (t 2 ,p 2 ),...

}

, where t 1 ,t 2 ,... are all possible answers

and p 1 ,p 2 ,... are their marginal probabilities.

This is the semantics that will be our main focus in the next chapter. The intuition behind

it is very simple. On a deterministic database, the query Q returns a set of tuples

{ t 1 ,t 2 ,... }

, while

on a probabilistic database, it returns a set of tuple-probability pairs

. These

answers can be returned to the user in decreasing order of their probabilities, such that p 1 ≥ p 2 ≥ ...

Notice that while in incomplete databases, we have two variants of the tuple answer semantics,

Q poss and Q cert ; in probabilistic databases, we only have one. The connection between them is given

by the following, where D

{ (t 1 ,p 1 ), (t 2 ,p 2 ),... }

( W ,P) :

Q poss ( W )

}

Q cert ( W ) ={ t | (t, p) ∈ Q( D ), p = 1 }

(t, p)

∈

Q( D ), p > 0

The possible tuples semantics is not compositional. Once we compute the result of a query

Q( D ) , we can no longer apply a new query Q because Q( D ) is not a probabilistic database: it is

only a collection of tuples and probabilities. However, the possible tuple semantics does compose

with the possible answer sets semantics for views: if V( D ) is a view computed using the possible

answer sets semantics, then we can apply a query Q , under the possible tuples semantics, Q(V ( D )) ;

this is equivalent to computing the query Q ◦ V on D under the possible tuples semantics.

2.4

C-TABLES AND PC-TABLES

Definition 2.2 does not suggest a practical representation of incomplete or probabilistic data. Indeed,

the explicit enumeration of all the possible worlds is not feasible when the number of worlds is very

large. To overcome this problem, several representation systems have been proposed, which are

concise ways to describe an incomplete or a probabilistic database. In this section, we define the

most general representation systems: conditional tables or c-tables, for incomplete databases and

probabilistic conditional tables or pc-tables for probabilistic databases.

A c-table is a relation where each tuple is annotated with a propositional formula, called

condition, over random variables. A pc-table further defines a probability space over the assignments

of the random variables. To define them formally, we first need a brief review of discrete variables

and propositional formulas.

Denote by Dom X

the finite domain of a discrete variable X . The event that X takes a value

∈

Dom X is denoted by X

a and is called an atomic event ,oran atomic formula .If Dom X =

{

true, false

}

, then we say that X is a Boolean variable and write X and

¬ X as shortcuts for the atomic

events X =

false , respectively. Denote by X a finite set of variables X 1 ,X 2 ,...,X n .A

valuation ,or assignment , is a function θ that maps each random variable X ∈

true and X =

X to a value θ(X) ∈

Probabilistic Databases

Search WWH ::

Custom Search

Home