Intensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

composition trees that support linear-time probability computation. Fink and Olteanu [ 2011 ] propose

a different approach to approximating the probability of propositional formula by giving lower and

upper bounds expressed as read-once expressions (mentioned briefly at the end of Subsection 5.3.1 ).

The FPTRAS algorithms described in Subsection 5.3.2 are much older and well established.

They were first introduced by Karp and Luby [ 1983 ] and Karp et al. [ 1989 ] and started a line of

research to derandomize these approximation techniques, eventually leading to a polynomial time

deterministic (ε, 0 ) -approximation algorithm [ Trevisan , 2004 ] for k -DNF formulas. This result

seems promising since a k -bound on the size of clauses of DNF formulas is not an unrealistic

assumption in the context of probabilistic databases, where k can represent the number of joins in case

of DNFs constructed by positive relational algebra queries. However, the constant in this algorithm

is astronomical (above 2 50 for 3-DNF). Another approach to approximating lineage expression

is based on Fourier series: one application to probabilistic databases is discussed by Ré and Suciu

[ 2008 ].

The compilation targets discussed in Section 5.2 have a long history. In a landmark pa-

per, Bryant [ 1986 ] introduced Ordered BDDs, which has led to a flurry of study of variants of

BDDs. A good survey of the topic can be found in [ Wegener , 2004 ]. In the AI community,

Darwiche and Marquis [ 2002 ] extended the idea of a circuit from BDDs to more complex cir-

cuits, which can represent succinctly a richer class of propositional formulas; d-DNNF's were intro-

duced by Darwiche [ 2001 ]. Compilation can be obtained from any probabilistic inference algorithm:

Darwiche [ 2006 ] describes a general approach for doing that, converting any probabilistic inference

algorithm into a circuit construction algorithm. We followed this principle when we argued that

Algorithm 2 can be converted from a probability-computation algorithm, into a circuit-construction

algorithm.

Section 5.4 on query compilation is based on the work by Olteanu and Huang [ 2008 ] and

Jha and Suciu [ 2011 ]. Olteanu and Huang [ 2008 ] first showed that all conjunctive queries without

self-joins are either read-once or #P-hard ( Theorem 5.13 ). This is surprising because it says that

for conjunctive queries without self-joins, every query is either extremely easy or extremely hard.

Jha and Suciu [ 2011 ] showed that, for UCQ queries, one obtains a strict hierarchy with respect to

different compilation targets ( Theorem 5.14 ). In other words, conjunctive queries without self-joins

and UCQ behave very differently with respect to query compilation. These results are summarized

in Figure 5.4 and Figure 5.5 .

Several researchers have studied techniques for practical query compilation. In two different

papers, Senetal. [ 2010 ] and Roy et al. [ 2011 ] study query compilation into read-once formulas.They

propose two different algorithms for testing whether the lineage of a conjunctive query without self-

joins is a read-once formula, by examining both the query and the input database. Olteanu and Huang

[ 2008 ] and Olteanu and Huang [ 2009 ] describe efficient compilation techniques into OBDDs and

FBDDs for conjunctive queries extended with disequalities (

) and inequalities ( < ), respectively,

and study which queries can be compiled this way. For conjunctive queries without self-joins, the

compilation-based probability computation technique presented by Olteanu et al. [ 2010 ] provably

=

Search WWH ::

Custom Search

Home