Database Reference
In-Depth Information
CHAPTER
5
Intensional Query Evaluation
In this chapter we discuss intensional query evaluation , which is based on the query's lineage. In-
tensional query evaluation reduces the query evaluation problem to computing the probability of a
propositional formula, P () .
Given a pc-database D (i.e., a set of pc-tables, cf. Section 2.7 ), a query Q(
x) , and a possible
¯
tuple
a , intensional evaluation computes P (Q) in two conceptual steps: first, compute the lineage
Q [ a/x ]
, then compute the probability P () . In practice, these steps are often intertwined. The
size of the lineage expression Q depends both on the query and on the database. This dependency
is polynomial in D and exponential in Q . More precisely, the size of the lineage formula is
=
| Q |=
m ) , where m is the number of variables in Q , and ADom is the active domain of the
database, i.e., the set of all constants occurring in the database.
The data complexity for the first step is in polynomial time, meaning that for any fixed query
Q , one can compute Q in polynomial time in the size of the database. The main complexity in
intensional query evaluation comes from the second step. Here we need to compute P () , where
is a propositional formula, whose size depends on the input database. This is a key difference to
extensional query evaluation; for example, we cannot apply the inclusion-exclusion rule on because
its complexity would be exponential in the size of and thus in the size of the input database.
The probability computation problem for propositional formulas has been studied both in
the verification and in the AI communities. We start by reviewing the most basic approaches for
computing P () in Section 5.1 . These can be best explained in terms of a set of rules, which compute
P () as a function of simpler propositional formulas P( 1 ),...,P( n ) . The rules either exploit
independence between formulas, or disjointness, or apply Shannon's expansion on one Boolean
variable X . The complexity of evaluating P () depends significantly on the order in which the rules
are applied, and on the order in which variables are eliminated during the Shannon expansion step.
Formula compilation refers to the translation of a propositional formula into a circuit for
computing . The circuit is required to have certain properties that ensure that one can compute
P () in linear time in the size of the circuit. We discuss in Section 5.2 four well-known types
of circuits, also called compilation targets : read-once formulas, OBDD, FBDD, and d-DNNF. One
approach to computing P () is thus to first compile and then compute its probability. The
advantage of this approach is that the compiled circuit can also be used for other applications,
besides the probability computation [ Darwiche and Marquis , 2002 ]. Most evaluation algorithms
for P () can be converted into an algorithm for constructing a circuit for [ Darwiche , 2006 ].
O( |
ADom
|
Search WWH ::




Custom Search