Database Reference
In-Depth Information
6. ADVANCED TECHNIQUES
6.4.2 MATERIALIZED VIEWS FOR RELATIONAL PROBABILISTIC
DATABASES
Materialized views are widely used today to speed up query evaluation in relational databases. Early
query optimizers used materialized views that were restricted to indexes (which are simple projections
on the attributes being indexed) and join indexes [ Valduriez , 1987 ]; modern query optimizers can
use arbitrary materialized views [ Agrawal et al. , 2000 ].
When used in probabilistic databases, materialized views can make a dramatic impact. Suppose
we need to evaluate a Boolean query Q on a BID probabilistic database, and assume Q is unsafe.
In this case, one has to use some general-purpose probabilistic inference method, for example, the
FPTRAS by Karp and Luby [ 1983 ] and Karp et al. [ 1989 ], and its performance in practice is much
worse than that of safe plans: one experimental study by RĂ© et al. [ 2007 ] has observed two orders
of magnitudes difference in performance. However, by rewriting Q in terms of a view it may be
possible to transform it into a safe query, which can be evaluated very efficiently. There is no magic
here: we simply pay the #P cost when we materialize the view, then evaluate the query in polynomial
time at runtime.
Example 6.7 Consider three tuple-independent relations R(C, A) , S(C,A,B) , T(C,B) , and
define the following view:
V(z) :-
R(z,x),S(z,x,y),T(z,y)
Denote V(Z) the schema of the materialized view. Then, all tuples in the materialized view V are
independent. For the intuition behind this statement, notice that for two different constants a = b ,
the Boolean queries V(a) and V(b) depend on disjoint sets of tuples in the input tuple-independent
probabilistic database. The first, V(a) , depends on inputs of the form R(a,...) , S(a,...) , T(a,...) ,
while the second on inputs of the form R(b,...) , S(b,...) , T(b,...) . Thus, V(a) and V(b) are
independent probabilistic events, and we say that the tuples a and b in the view are indepen-
dent. In general, any set of tuples a,b,c,... in the view are independent. Suppose we compute
and store the view, meaning that we will determine all its tuples a,b,c,... and compute their
probabilities. This will be expensive because, for each constant a , the Boolean query V(a) is es-
sentially equivalent to H 0 ( Chapter 3 ); hence, it is #P-hard. Nevertheless, we will pay this cost and
materialize the view. Later, we will use V to answer queries. For example, consider the Boolean
query Q :- R(z,x),S(z,x,y),T(z,y),U(z,v) , where U(C,D) is another tuple-independent rela-
tion. Then Q is #P-hard, but after rewriting it as Q :- V(z),U(z,v) it becomes a safe query, and it
can be computed by a safe plan. Thus, by using V to evaluate Q , we obtain a dramatic reduction in
complexity.
The major challenge in using probabilistic views for query processing is how to find, represent,
and use independence relationships between the tuples in the view. In general, the tuples in the view
may be correlated in complex ways. One possibility is to store the lineage for each tuple t , but this
Search WWH ::




Custom Search