Advanced Techniques - Probabilistic Databases

Database Reference

In-Depth Information

6. ADVANCED TECHNIQUES

6.4.2 MATERIALIZED VIEWS FOR RELATIONAL PROBABILISTIC

DATABASES

Materialized views are widely used today to speed up query evaluation in relational databases. Early

query optimizers used materialized views that were restricted to indexes (which are simple projections

on the attributes being indexed) and join indexes [ Valduriez , 1987 ]; modern query optimizers can

use arbitrary materialized views [ Agrawal et al. , 2000 ].

When used in probabilistic databases, materialized views can make a dramatic impact. Suppose

we need to evaluate a Boolean query Q on a BID probabilistic database, and assume Q is unsafe.

In this case, one has to use some general-purpose probabilistic inference method, for example, the

FPTRAS by Karp and Luby [ 1983 ] and Karp et al. [ 1989 ], and its performance in practice is much

worse than that of safe plans: one experimental study by Ré et al. [ 2007 ] has observed two orders

of magnitudes difference in performance. However, by rewriting Q in terms of a view it may be

possible to transform it into a safe query, which can be evaluated very efficiently. There is no magic

here: we simply pay the #P cost when we materialize the view, then evaluate the query in polynomial

time at runtime.

Example 6.7 Consider three tuple-independent relations R(C, A) , S(C,A,B) , T(C,B) , and

define the following view:

V(z) :-

R(z,x),S(z,x,y),T(z,y)

Denote V(Z) the schema of the materialized view. Then, all tuples in the materialized view V are

independent. For the intuition behind this statement, notice that for two different constants a = b ,

the Boolean queries V(a) and V(b) depend on disjoint sets of tuples in the input tuple-independent

probabilistic database. The first, V(a) , depends on inputs of the form R(a,...) , S(a,...) , T(a,...) ,

while the second on inputs of the form R(b,...) , S(b,...) , T(b,...) . Thus, V(a) and V(b) are

independent probabilistic events, and we say that the tuples a and b in the view are indepen-

dent. In general, any set of tuples a,b,c,... in the view are independent. Suppose we compute

and store the view, meaning that we will determine all its tuples a,b,c,... and compute their

probabilities. This will be expensive because, for each constant a , the Boolean query V(a) is es-

sentially equivalent to H 0 ( Chapter 3 ); hence, it is #P-hard. Nevertheless, we will pay this cost and

materialize the view. Later, we will use V to answer queries. For example, consider the Boolean

query Q :- R(z,x),S(z,x,y),T(z,y),U(z,v) , where U(C,D) is another tuple-independent rela-

tion. Then Q is #P-hard, but after rewriting it as Q :- V(z),U(z,v) it becomes a safe query, and it

can be computed by a safe plan. Thus, by using V to evaluate Q , we obtain a dramatic reduction in

complexity.

The major challenge in using probabilistic views for query processing is how to find, represent,

and use independence relationships between the tuples in the view. In general, the tuples in the view

may be correlated in complex ways. One possibility is to store the lineage for each tuple t , but this

Search WWH ::

Custom Search

Home