Database Reference
In-Depth Information
makes query evaluation on the view no more efficient than expanding the view definition in the
query.
To cope with this problem researchers have considered three main techniques. The first two
techniques exploit the common substructure in the formula. The first idea is a sophisticated form
of caching. The main idea is that the lineage formula may have common sub-components. Rather
than re-evaluating these sub-components, we can simply cache the results and avoid expensive re-
computation. Of course, identifying such sub-components is a difficult problem and is the main
technical challenge. This idea has been applied both to BID-style databases [ Sarmaetal. , 2008b ]
and to graphical model-based approaches [ Senetal. , 2009 ].
A second approach that exploits the regularity of the lineage formula is to approximate the
lineage formula. Here, the main idea is to replace the original Boolean lineage formula t for a tuple t
with a new Boolean formula ˜ t that is smaller (i.e., uses fewer variables). More precisely, given some
ε 0, we can choose a formula
[ ˜ t = ˜ t ]≤ ε , i.e., the disagreement probability is less
than ε . The size of ˜ t is a function only of ε , but the size can be bounded independently from the
size of the original lineage formula. Simply shrinking the lineage has can improve query processing
time: roughly speaking, the Luby-Karp-based algorithms of Section 5.3 that approximate P ()
take time roughly quadratic in the size of the lineage formula. A second advantage of this approach
is that the formula ˜ is syntactically identical to a conventional lineage formula and so requires
no additional machinery to process. Finding the smallest such ˜ t is a computationally challenging
problem. Nevertheless, one can show that even simple greedy solutions still find ˜ t that are hundreds
of times smaller. Using ideas from Boolean Harmonic Analysis, one can replace the functions ˜ t
with multi-linear polynomials which may allow even further compression. But, the resulting lineage
formula require new query processing algorithms.This approach is discussed by Ré and Suciu [ 2008 ].
A third (and more aggressive technique), discussed by Dalvi et al. [ 2011 ], is to simply throw
away the entire lineage formula - essentially assuming that the tuples in the view are a new relation.
Of course, using such a view naively may result in incorrect answers. However, some queries may
not be affected by this choice. A trivial example is that a query which asks for a marginal probability
of a single tuple can be trivially answered. But so can more sophisticated queries. Intuitively, if one
can identify that a query only touches tuples whose correlations are either independent or disjoint
in the view - avoiding those tuples that may have complex correlations - one can use the view to
process the query. Deducing this is non-trivial (the problem is 2 -Complete [ Dalvi et al. , 2011 ]).
Nonetheless, there are efficient sound (but not complete) heuristics that can decide, given a query
Q and a view V ,if V can be used to answer Q [ Dalvi et al. , 2011 ].
˜ t
so that P
Search WWH ::




Custom Search