Database Reference
In-Depth Information
5. INTENSIONAL QUERY EVALUATION
if the DNFs are very large and the Karp-Luby MC is preferable if its probability is small. In a
DNF that was created as the lineage of a conjunctive query, the number of variables in each clause
is the number of joins in the query plus one. So, even if we assume that probabilities of base tuples
in tuple-independent or BID tables are lower-bounded by 0.1, a query with three joins may still
produce probabilities on the order of 1/10000. Taking the above bounds at face value, very large
DNFs - as a product of projections that map large numbers of tuples together - are required for
the naïve algorithm to be competitive with the Karp-Luby algorithm. Note, though, that these
bounds on the number of required iterations are far from tight, and sequential analysis techniques
such as those of Dagum et al. [ 2000 ] can be used to detect when a Monte Carlo algorithm can be
stopped much earlier.The technique of Dagum et al. [ 2000 ] puts the more sophisticated algorithm at
advantage over the naïve one - in experiments performed in [ Koch and Olteanu , 2008 , Olteanu et al. ,
2009 , 2010 ], the optimal approximation scheme of Dagum et al. [ 2000 ] usually led to two orders
of magnitude fewer Monte Carlo iterations than the above bound on the required iterations of the
Karp-Luby algorithm suggested.
While the absolute values of the output probabilities are often of little significance to the user,
the system needs good approximations of these probabilities for several purposes: in order to rank the
output answers, cf. Section 6.1 , when approximate probabilities are used in range predicates [ Koch ,
2008b ], or when conditional probabilities are computed as ratios of approximated probabilities [ Koch ,
2008b , Koch and Olteanu , 2008 ].
5.4
QUERY COMPILATION
In this section, we restrict the input database D to be a tuple-independent database, meaning that
every tuple t is annotated with a unique Boolean variable X t . We study the following problem. Fix a
Boolean query Q , which determines a family of propositional formulas, Q , one formula for every
database D . Consider one of the four compilation targets discussed in Section 5.2 . Query compilation
is a function that maps every database D into a circuit for Q in that target. We say that the query
admits an efficient compilation , if the size of this circuit is bounded by a polynomial in the size of the
database D . In this section, we ask the following question: which queries admit an efficient compilation
into a given target?
Denote
one of the four compilation targets: RO (read once), OBDD, FBDD, and d-
DNNF ¬ . We consider three query languages: the entire Relational Calculus (RC), Unions of Con-
junctive Queries (UCQ), and Conjunctive Queries (CQ), see Section 2.1 . Thus, queries are built
from atomic predicates using the connectives
C
, , , ¬
L
. For each query language
, we denote
L
C
L
C
(
) the class of queries in
that admit an “efficient compilation” to
. Formally:
L
Definition 5.12
Let
be a query language.
￿
L
with the following property: for every database
instance D , the lineage Q is a read-once propositional formula.
(RO) represents the set of queries Q
L
Search WWH ::




Custom Search