Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

2.6

PROPERTIES OF A REPRESENTATION SYSTEM

We expect two useful properties from a good representation system for incomplete or for probabilistic

databases. First, it should be able to represent any incomplete or probabilistic database. Second, it

should be able to represent the answer to any query, under the possible answer sets semantics. The

first property, called completeness , implies the second property, which is called closure under a query

language .

Definition 2.11 A representation system for probabilistic databases is called complete if it can

represent any 1 probabilistic database D

( W ,P) .

Theorem 2.12

PC-tables are a complete representation system.

Proof. The

proof

fairly

simple.

Given

finite

set

possible

worlds

{ R 1 ,...,R k ,..., R 1 ,...,R k }

with

probabilities

p 1 ,...,p n ,

create

pc-table

PCD

and

let P(X = i) = p i , for all 1 ≤ i ≤ n . Intuitively, there is exactly one assignment X = i , corre-

sponding to the i th possible world.

For all 1

(CD, P ) as follows. Let X be a random variable whose domain is

{

1 ,...,n

}

≤ j ≤ k , the table R j in CD is the union of all instances R j ,...,R j . For each tuple

t ∈ R j , the condition t is the disjunction of all conditions X = i , for all i s.t. t ∈ R j : formally,

t = i : t ∈ R j (X = i) . It is easy to verify that the constructed pc-table represents exactly the input

probabilistic database.

Consider a representation formalism, like pc-tables, or one of the weaker formalisms con-

sidered in the next section. Let D be a probabilistic database represented in this formalism. Given

a query Q , can we represent V

= Q( D ) in the same formalism? Here V is another probabilis-

tic database, defined by the possible answer sets semantics ( Subsection 2.3.1 ), and the question is

whether it can be represented in the same representation formalism. If the answer is “yes”, then we

say that the representation formalism is closed under that particular query language.

Obviously, any complete representation system is also closed; therefore, Theorem 2.12 has the

following Corollary:

Corollary 2.13

PC-tables are closed under the Relational Calculus.

However, using Theorem 2.12 to prove the Corollary is rather unsatisfactory because it is non-

constructive. A constructive proof of Corollary 2.13 uses lineage. More precisely, let D

= (CD, P )

be a pc-database, and let Q( x) be a query with k head variables. Let A =

ADom(CD) be the active

a ∈ A k , and each

domain of CD . Then V

= Q( D ) is the following pc-table. It consists of all tuples

tuple

a is annotated with the propositional formula Q [ a/x ]

. This defines the c-table part of V .For

1 Recall that we restrict our discussion to finite probabilistic databases.

Probabilistic Databases

Search WWH ::

Custom Search

Home