Database Reference
In-Depth Information
nellSource(source, P)
nellExtraction(entity, relation, value, source, P)
The first table stores all the sources and their reliabilities. Source reliability is independent across
sources, so the tuples in nellSource are independent.The second table stores the extractions, conditioned
on the source being reliable: under this condition all extractions are independent. Thus, we have
represented the entire database using two large tuple-independent tables. Our initial probabilistic
database shown in Figure 1.2 can be derived from the base tables using the following views 2 :
ProducesProduct (x, y) :- nellExtraction (x, 'ProducesProduct' ,y,s), nellSource (s)
HeadquarteredIn (x, y) :- nellExtraction (x, 'HeadquarteredIn' ,y,s), nellSource (s)
...
Thus, all views are expressed over two tuple-independent tables, but they contain tuples that are
correlated in complex ways.
Does this example generalize? Can we express any probabilistic database as a view over tuple-
independent tables? The answer is “yes”, but the tuple-independent tables may be very large, and
even the view definition may be large too:
Tuple-independent tables extended with RC views are a complete representation
Proposition 2.16
system.
Proof. Let D
possible worlds. To simplify the
notations, assume that the database schema has a single relation name R( A) ; hence, the n possible
worlds in D are n relations, R 1 ,...,R n ; the general case follows immediately and is omitted. We
prove the following statement by induction on n :if D has n possible worlds, then there exists a prob-
abilistic database ID n = ( S n ,W n ,P n ) , over a schema
= ( W ,P) be a probabilistic database with n =|
W
|
A), W (K)
S(K,
, such that (a) the relation
W n is a tuple-independent probabilistic relation, with n
1 independent tuples k 1 ,...,k n 1 , with
probabilities P n (k 1 ),...,P n (k n 1 ) , (b) the relation S n is a deterministic relation, and (c) there exists
a query, Q n (which depends on n ), in the Relational Calculus, s.t. D
Q n ( ID n ) . Note that ID n
has 2 n 1 possible worlds: the query Q n maps them to only n possible outputs, returning exactly D .
If n
=
1, then D has a single world R 1 . Choose any constant k 1 and define: S 1 ={
R 1 ,
=
k 1
W 1 =∅
, and:
Q 1 =
A (S)
In other words, S 1 is exactly R 1 (plus one extra attribute), and the query Q 1 projects out that extra
attribute.
2 Note that the view definition does not mention the attribute P . This is standard in probabilistic databases: the query is written
over the possible world, not over the representation.
Search WWH ::




Custom Search