Data andQuery Model - Probabilistic Databases

Database Reference

In-Depth Information

nellSource(source, P)

nellExtraction(entity, relation, value, source, P)

The first table stores all the sources and their reliabilities. Source reliability is independent across

sources, so the tuples in nellSource are independent.The second table stores the extractions, conditioned

on the source being reliable: under this condition all extractions are independent. Thus, we have

represented the entire database using two large tuple-independent tables. Our initial probabilistic

database shown in Figure 1.2 can be derived from the base tables using the following views 2 :

ProducesProduct (x, y) :- nellExtraction (x, 'ProducesProduct' ,y,s), nellSource (s)

HeadquarteredIn (x, y) :- nellExtraction (x, 'HeadquarteredIn' ,y,s), nellSource (s)

...

Thus, all views are expressed over two tuple-independent tables, but they contain tuples that are

correlated in complex ways.

Does this example generalize? Can we express any probabilistic database as a view over tuple-

independent tables? The answer is “yes”, but the tuple-independent tables may be very large, and

even the view definition may be large too:

Tuple-independent tables extended with RC views are a complete representation

Proposition 2.16

system.

Proof. Let D

possible worlds. To simplify the

notations, assume that the database schema has a single relation name R( A) ; hence, the n possible

worlds in D are n relations, R 1 ,...,R n ; the general case follows immediately and is omitted. We

prove the following statement by induction on n :if D has n possible worlds, then there exists a prob-

abilistic database ID n = ( S n ,W n ,P n ) , over a schema

= ( W ,P) be a probabilistic database with n =|

A), W (K)

S(K,

, such that (a) the relation

W n is a tuple-independent probabilistic relation, with n

1 independent tuples k 1 ,...,k n − 1 , with

probabilities P n (k 1 ),...,P n (k n − 1 ) , (b) the relation S n is a deterministic relation, and (c) there exists

a query, Q n (which depends on n ), in the Relational Calculus, s.t. D

−

Q n ( ID n ) . Note that ID n

has 2 n − 1 possible worlds: the query Q n maps them to only n possible outputs, returning exactly D .

If n

1, then D has a single world R 1 . Choose any constant k 1 and define: S 1 ={

R 1 ,

k 1 }×

W 1 =∅

, and:

Q 1 =

A (S)

In other words, S 1 is exactly R 1 (plus one extra attribute), and the query Q 1 projects out that extra

attribute.

2 Note that the view definition does not mention the attribute P . This is standard in probabilistic databases: the query is written

over the possible world, not over the representation.

Probabilistic Databases

Search WWH ::

Custom Search

Home