Database Reference
In-Depth Information
composition trees that support linear-time probability computation. Fink and Olteanu [ 2011 ] propose
a different approach to approximating the probability of propositional formula by giving lower and
upper bounds expressed as read-once expressions (mentioned briefly at the end of Subsection 5.3.1 ).
The FPTRAS algorithms described in Subsection 5.3.2 are much older and well established.
They were first introduced by Karp and Luby [ 1983 ] and Karp et al. [ 1989 ] and started a line of
research to derandomize these approximation techniques, eventually leading to a polynomial time
deterministic (ε, 0 ) -approximation algorithm [ Trevisan , 2004 ] for k -DNF formulas. This result
seems promising since a k -bound on the size of clauses of DNF formulas is not an unrealistic
assumption in the context of probabilistic databases, where k can represent the number of joins in case
of DNFs constructed by positive relational algebra queries. However, the constant in this algorithm
is astronomical (above 2 50 for 3-DNF). Another approach to approximating lineage expression
is based on Fourier series: one application to probabilistic databases is discussed by Ré and Suciu
[ 2008 ].
The compilation targets discussed in Section 5.2 have a long history. In a landmark pa-
per, Bryant [ 1986 ] introduced Ordered BDDs, which has led to a flurry of study of variants of
BDDs. A good survey of the topic can be found in [ Wegener , 2004 ]. In the AI community,
Darwiche and Marquis [ 2002 ] extended the idea of a circuit from BDDs to more complex cir-
cuits, which can represent succinctly a richer class of propositional formulas; d-DNNF's were intro-
duced by Darwiche [ 2001 ]. Compilation can be obtained from any probabilistic inference algorithm:
Darwiche [ 2006 ] describes a general approach for doing that, converting any probabilistic inference
algorithm into a circuit construction algorithm. We followed this principle when we argued that
Algorithm 2 can be converted from a probability-computation algorithm, into a circuit-construction
algorithm.
Section 5.4 on query compilation is based on the work by Olteanu and Huang [ 2008 ] and
Jha and Suciu [ 2011 ]. Olteanu and Huang [ 2008 ] first showed that all conjunctive queries without
self-joins are either read-once or #P-hard ( Theorem 5.13 ). This is surprising because it says that
for conjunctive queries without self-joins, every query is either extremely easy or extremely hard.
Jha and Suciu [ 2011 ] showed that, for UCQ queries, one obtains a strict hierarchy with respect to
different compilation targets ( Theorem 5.14 ). In other words, conjunctive queries without self-joins
and UCQ behave very differently with respect to query compilation. These results are summarized
in Figure 5.4 and Figure 5.5 .
Several researchers have studied techniques for practical query compilation. In two different
papers, Senetal. [ 2010 ] and Roy et al. [ 2011 ] study query compilation into read-once formulas.They
propose two different algorithms for testing whether the lineage of a conjunctive query without self-
joins is a read-once formula, by examining both the query and the input database. Olteanu and Huang
[ 2008 ] and Olteanu and Huang [ 2009 ] describe efficient compilation techniques into OBDDs and
FBDDs for conjunctive queries extended with disequalities (
) and inequalities ( < ), respectively,
and study which queries can be compiled this way. For conjunctive queries without self-joins, the
compilation-based probability computation technique presented by Olteanu et al. [ 2010 ] provably
=
Search WWH ::




Custom Search