Database Reference
In-Depth Information
4.2
QUERY EVALUATION USING EXTENSIONAL PLANS
An extensional operator is a standard relational algebra operator that is extended to manipulate
tuple probabilities explicitly. For example, a join will multiply the probabilities, a projection with
duplicate elimination may treat tuples as either independent and compute the output probability
as 1
... An
extensional plan for a query Q is a relational plan with extensional operators. In general, the plan
may not compute the output tuple probabilities correctly; when it computes them correctly for any
input tuple-independent (or BID) database, then the plan is called a safe plan . For example, the plan
on the left of Figure 4.3 is an extensional, but unsafe plan, while the plan on the right is safe.
In this section, we describe several extensional operators and show that every safe query Q can
translated into a safe plan. This is very important for practical purposes because it means that one
can evaluate Q by using existing query evaluation engines and take advantage of indexes, different
join implementations, cost-based query optimization, or parallelism. If a query is safe, then it scales
to large probabilistic databases like queries on regular databases. If the query is unsafe, then one can
still use an extensional plan to evaluate Q on a probabilistic database: it still scales, but the output
probabilities are incorrect. We also discuss briefly in this section how extensional plans can be used
to approximate queries that are unsafe.
In this section, we no longer restrict the queries to Boolean queries; instead, we allow a query to
have arbitrary head variables. We continue to assume that all input relations are tuple-independent,
and we will discuss extensions to BID tables in Section 4.3 .
The extensional plans will manipulate relations with an explicit probability attribute. That is,
every relation R has schema of the form R( A, P ) , where A are the regular attributes and P is the
probability. We refer to A (R) as the deterministic part of R .If u
( 1
p 1 )
·
( 1
p 2 )
···
, or it may treat tuples as disjoint and compute p 1 +
p 2 +
R is a tuple, then u. A is a regular
A -tuple
tuple, and u.P is its probability. Given an
a , we write P R ( a) for the probability of
a in
u. A for some tuple u
R : That is, if
a
¯
=
R then P R (
a)
¯
=
u.P ; otherwise, P R (
a)
¯
=
0. We assume
A isakeyin R( A, P ) .
w.l.o.g. that every tuple
a occurs at most once in R ; in other words,
4.2.1 EXTENSIONAL OPERATORS
We now describe each of the extensional operators that are used in safe plans.
4.2.1.1 Independent Join
The independent join of two relations R( A, P ), S( B,P) , in notation R
i C S , is the regular join of
C B (S) , augmented with the product of the two probabilities,
their deterministic parts, A (R)
from R and from S :
i
a, b, P R (
P S ( b))
A (R), b
a, b)
R
C S
={
(
¯
a)
¯
·
a
B (S), (
¯
A (R)
C B (S)
}
A and
B .
Here C is the join condition, expressed over the attributes
Search WWH ::




Custom Search