Database Reference
In-Depth Information
4. EXTENSIONAL QUERY EVALUATION
Depending on the selectivities of join conditions, probability computation can be pushed past joins
in the query plans.
Olteanu et al.
[
2009
] also extend relational plans with a new aggregation oper-
ator that is optimized using the query structure and schema information in the form of keys and
functional dependencies to minimize the number of table scans needed for probability computation.
This operator computes the query probability while factorizing the query lineage into read-once
formulas in polynomial time.
Olteanu and Huang
[
2009
] extend this idea to a class of tractable con-
junctive queries with inequalities.
Gatterbauer et al.
[
2010
] introduced the dissociation technique
described in
Subsection 4.2.3
; they define an order between query plans and thus approximate the
query probabilities with best possible upper bounds in a database independent way.
Section 4.3
is based on several references. Query evaluation on BID tables was first discussed
by
Andritsos et al.
[
2006
] and
Ré et al.
[
2006
]; the extended algorithm described in
Subsection 4.3.1
and the dichotomy for conjunctive queries without self-joins was given by [
Dalvi and Suciu
,
2007a
].
Safe plans for BID tables are discussed by
Ré et al.
[
2006
]. Only conjunctive queries without self-
joins have been studied over BID tables; the evaluation of UCQ, or arbitrary Relational Calculus
queries over BID tables has not been studied yet. The second rule that we give in
Subsection 4.3.1
,
which pushes the disjoint-project past unions, was communicated to us by Abhay Jha. Extensions
to queries over deterministic tables (
Subsection 4.3.2
) and functional dependencies in the represen-
tation (
Subsection 4.3.3
) are discussed by
Dalvi and Suciu
[
2007b
], who also establish a dichotomy
for conjunctive queries without self-joins. The chase-like criteria for checking safety over tables with
functional dependencies that we described in
Subsection 4.3.3
is much simpler than the criteria
given by
Dalvi and Suciu
[
2007b
], and it was first introduced by
Olteanu et al.
[
2009
].
Several extensions of the query evaluation problem for richer probabilistic models have been
considered in the literature.
Sen and Deshpande
[
2007
] discuss query evaluation over probabilistic databases represented
by a graphical model. An optimization to query processing over such probabilistic databases is
described by
Senetal.
[
2008
]. The optimization finds common subgraphs in the GM and applies
a technique similar to
lifted inference
[
Poole
,
2003
]; this can be very effective over large probabilistic
databases, because they tend to have a large number of similar, repeated subgraphs.
Twig queries over probabilistic XML data are discussed by
Senellart and Abiteboul
[
2007
],
Cohen et al.
[
2009
],
Kimelfeld et al.
[
2008
], and
Kimelfeld et al.
[
2009
]: They give a polynomial-
time algorithm for computing any twig pattern over an XML document in the
PrXML
exp
model.
Furthermore, it is shown that if the model is extended to
PrXML
cie
, then
every
non-trivial twig
pattern has #P-hard data complexity. Later,
Benedikt et al.
[
2010
] consider more expressive and
exponentially more succinct models that are restrictions of recursive Markov Chains (RMCs, cf.
[
Etessami and Yannakakis
,
2009
]). RMCs are extensions of the standard Markov chains, i.e., of
graphs whose edges are labeled with probabilities and that define processes evolving via independent
choices at nodes. The extension consists of a notion of subroutine or recursive call and, consequently,
the runs of RMCs have a natural hierarchical structure, and it can thus be seen as nested words or
trees. These new models capture probabilistic versions of schemas of XML documents, can define