Database Reference
In-Depth Information
2.8
BIBLIOGRAPHIC AND HISTORICAL NOTES
The seminal work on incomplete information by Imieli nski and Lipski [ 1984 ] introduced three
kinds of representation systems. Codd-tables are tables with nulls; v-tables are tables that may
contain variables, also called marked nulls 6 , in addition to constants; c-tables are v-tables where each
tuple is annotated with a propositional formula. In addition, a c-table may specify a global condition
that restricts the set of possible worlds to those defined by total valuations that satisfy this global
condition. The c-tables in this topic are restricted to contain only constants (no variables), and we
also dropped the global condition.
The early probabilistic data model introduced by [ Barbará et al. , 1992 ] is essentially a BID data
model. While the authors had been inspired by earlier work on incomplete information and c-tables,
a formal connection was not established until very recently. Green and Tannen [ 2006 ] provide a rig-
orous discussion of the relationships between incomplete databases and probabilistic databases, and
they introduced the term pc-table , which we also used in this topic. Several researchers have proposed
representation systems for probabilistic databases. The Trio system, discussed by Benjelloun et al.
[ 2006a , b ], Sarmaetal. [ 2006 , 2009b ], Widom [ 2005 ], designs a model for incomplete and proba-
bilistic databases based on maybe-tuples, X-tuples, and lineage expressions, searching for the right
balance of expressiveness and simplicity. Tuple-independent probabilistic databases are discussed
by Dalvi and Suciu [ 2004 ], motivated by queries with approximate predicates, which introduce an
independent event for every potential match. Disjoint-independent probabilistic databases are dis-
cussed by Andritsos et al. [ 2006 ], Dalvi and Suciu [ 2007c ]. Poole [ 1993 ] states that an arbitrary
probability space can be represented by composing primitive independent-disjoint events, which
captures the essence of Proposition 2.18 ; the proposition in the form presented here is mentioned in
Dalvi and Suciu [ 2007c ]. Proposition 2.16 and Proposition 2.17 seem folklore, but we were not able
to trace them to any specific reference. U-relations have been originally introduced in the context of
the MayBMS project by Antova et al. [ 2008 ].
Several other representation formalisms for probabilistic databases are discussed in the litera-
ture. World-set decompositions are a complete representation formalism for uncertain and probabilistic
data [ Antova et al. , 2007c , 2009 , Olteanu et al. , 2008 ]. The decomposition used by this formalism
is a prime factorization of a universal relation representation [ Ullman , 1990 ] of the set of possi-
ble worlds representing the probability space. In their probabilistic form, such decompositions can
be thought of as shallow Bayesian Networks. Li and Deshpande [ 2009 ] define the And/Xor Tree
Model , which generalizes BID tables by allowing arbitrary interleaving of and (independence) and
xor (disjointness) relationships between the tuples. The And/Xor Tree model is a special case of
WS-trees introduced by Koch and Olteanu [ 2008 ] and, subsequently, generalized by decomposition
trees [ Olteanu et al. , 2010 ]. Several of the data models for probabilistic databases are surveyed in
the first edited collection of articles on the topic [ Aggarwal , 2008 ]. More recent work considered
extensions of c-tables with continuous probability distributions [ Kennedy and Koch , 2010 ].
6 In a Codd-table, all nulls are distinct variables; in other words, a Codd-table cannot assert that two values are equal.
Search WWH ::




Custom Search