Database Reference
In-Depth Information
4. EXTENSIONAL QUERY EVALUATION
Depending on the selectivities of join conditions, probability computation can be pushed past joins
in the query plans. Olteanu et al. [ 2009 ] also extend relational plans with a new aggregation oper-
ator that is optimized using the query structure and schema information in the form of keys and
functional dependencies to minimize the number of table scans needed for probability computation.
This operator computes the query probability while factorizing the query lineage into read-once
formulas in polynomial time. Olteanu and Huang [ 2009 ] extend this idea to a class of tractable con-
junctive queries with inequalities. Gatterbauer et al. [ 2010 ] introduced the dissociation technique
described in Subsection 4.2.3 ; they define an order between query plans and thus approximate the
query probabilities with best possible upper bounds in a database independent way.
Section 4.3 is based on several references. Query evaluation on BID tables was first discussed
by Andritsos et al. [ 2006 ] and Ré et al. [ 2006 ]; the extended algorithm described in Subsection 4.3.1
and the dichotomy for conjunctive queries without self-joins was given by [ Dalvi and Suciu , 2007a ].
Safe plans for BID tables are discussed by Ré et al. [ 2006 ]. Only conjunctive queries without self-
joins have been studied over BID tables; the evaluation of UCQ, or arbitrary Relational Calculus
queries over BID tables has not been studied yet. The second rule that we give in Subsection 4.3.1 ,
which pushes the disjoint-project past unions, was communicated to us by Abhay Jha. Extensions
to queries over deterministic tables ( Subsection 4.3.2 ) and functional dependencies in the represen-
tation ( Subsection 4.3.3 ) are discussed by Dalvi and Suciu [ 2007b ], who also establish a dichotomy
for conjunctive queries without self-joins. The chase-like criteria for checking safety over tables with
functional dependencies that we described in Subsection 4.3.3 is much simpler than the criteria
given by Dalvi and Suciu [ 2007b ], and it was first introduced by Olteanu et al. [ 2009 ].
Several extensions of the query evaluation problem for richer probabilistic models have been
considered in the literature.
Sen and Deshpande [ 2007 ] discuss query evaluation over probabilistic databases represented
by a graphical model. An optimization to query processing over such probabilistic databases is
described by Senetal. [ 2008 ]. The optimization finds common subgraphs in the GM and applies
a technique similar to lifted inference [ Poole , 2003 ]; this can be very effective over large probabilistic
databases, because they tend to have a large number of similar, repeated subgraphs.
Twig queries over probabilistic XML data are discussed by Senellart and Abiteboul [ 2007 ],
Cohen et al. [ 2009 ], Kimelfeld et al. [ 2008 ], and Kimelfeld et al. [ 2009 ]: They give a polynomial-
time algorithm for computing any twig pattern over an XML document in the PrXML exp model.
Furthermore, it is shown that if the model is extended to PrXML cie , then every non-trivial twig
pattern has #P-hard data complexity. Later, Benedikt et al. [ 2010 ] consider more expressive and
exponentially more succinct models that are restrictions of recursive Markov Chains (RMCs, cf.
[ Etessami and Yannakakis , 2009 ]). RMCs are extensions of the standard Markov chains, i.e., of
graphs whose edges are labeled with probabilities and that define processes evolving via independent
choices at nodes. The extension consists of a notion of subroutine or recursive call and, consequently,
the runs of RMCs have a natural hierarchical structure, and it can thus be seen as nested words or
trees. These new models capture probabilistic versions of schemas of XML documents, can define
Search WWH ::




Custom Search