Intensional Query Evaluation - Probabilistic Databases

Database Reference

In-Depth Information

CHAPTER

5

Intensional Query Evaluation

In this chapter we discuss intensional query evaluation , which is based on the query's lineage. In-

tensional query evaluation reduces the query evaluation problem to computing the probability of a

propositional formula, P () .

Given a pc-database D (i.e., a set of pc-tables, cf. Section 2.7 ), a query Q(

x) , and a possible

¯

tuple

a , intensional evaluation computes P (Q) in two conceptual steps: first, compute the lineage

Q [ a/x ]

, then compute the probability P () . In practice, these steps are often intertwined. The

size of the lineage expression Q depends both on the query and on the database. This dependency

is polynomial in D and exponential in Q . More precisely, the size of the lineage formula is

=

| Q |=

m ) , where m is the number of variables in Q , and ADom is the active domain of the

database, i.e., the set of all constants occurring in the database.

The data complexity for the first step is in polynomial time, meaning that for any fixed query

Q , one can compute Q in polynomial time in the size of the database. The main complexity in

intensional query evaluation comes from the second step. Here we need to compute P () , where

is a propositional formula, whose size depends on the input database. This is a key difference to

extensional query evaluation; for example, we cannot apply the inclusion-exclusion rule on because

its complexity would be exponential in the size of and thus in the size of the input database.

The probability computation problem for propositional formulas has been studied both in

the verification and in the AI communities. We start by reviewing the most basic approaches for

computing P () in Section 5.1 . These can be best explained in terms of a set of rules, which compute

P () as a function of simpler propositional formulas P( 1 ),...,P( n ) . The rules either exploit

independence between formulas, or disjointness, or apply Shannon's expansion on one Boolean

variable X . The complexity of evaluating P () depends significantly on the order in which the rules

are applied, and on the order in which variables are eliminated during the Shannon expansion step.

Formula compilation refers to the translation of a propositional formula into a circuit for

computing . The circuit is required to have certain properties that ensure that one can compute

P () in linear time in the size of the circuit. We discuss in Section 5.2 four well-known types

of circuits, also called compilation targets : read-once formulas, OBDD, FBDD, and d-DNNF. One

approach to computing P () is thus to first compile and then compute its probability. The

advantage of this approach is that the compiled circuit can also be used for other applications,

besides the probability computation [ Darwiche and Marquis , 2002 ]. Most evaluation algorithms

for P () can be converted into an algorithm for constructing a circuit for [ Darwiche , 2006 ].

O( |

ADom

|

Probabilistic Databases

Search WWH ::

Custom Search

Home