Database Reference
In-Depth Information
The algorithm includes cache-results and amortize-scans strategies by means
of computing nodes with common prefixes in a single scan. This is called
pipelined evaluation in database query optimization. In this way, we could
compute ABCD , ABC , AB ,and A in a single scan because the attribute order
in the view is the sorting order in the file. For example, in the base table below,
with a single scan of the first five tuples, we can compute the aggregations
( a1 , b1 , c1 , 200 ), ( a1 , b1 , c2 , 500 ), ( a1 , b1 , 700 ), ( a1 , b2 , 400 ), and ( a1 , 1100 ).
A B C D
a1 b1 c1 d1 100
a1 b1 c1 d2 100
a1 b1 c2 d1 200
a1 b1 c2 d1 300
a1 b2 c1 d1 400
a2 b1 c1 d1 100
a2 b1 c2 d2 400
··· ··· ··· ··· ···
The input of the algorithm is a data cube lattice in which each edge e ij ,
where node i is the parent of node j , is labeled with two costs, S ( e ij )and
A ( e ij ). S ( e ij ) is the cost of computing j from i if i is not sorted. A ( e ij )is
the cost of computing j from i if i is already sorted. Thus, A ( e ij ) ≤ S ( e ij ).
In addition, we consider the lattice organized into levels, where each level
k contains views with exactly k attributes, starting from All ,where k =0.
This data structure is called a search lattice .
The output of the algorithm is a subgraph of the search lattice such that
each node has exactly one parent from which it will be computed in a certain
mode, that is, sorted or not (note that in the search lattice, each node, except
All , has more than one parent). If the attribute order of a node j is a prefix of
the order of its parent i ,then j can be computed from i without sorting the
latter, and in the resulting graph, the edge will have cost A ( e ij ). Otherwise,
i has to be sorted to compute j and the edge will have cost S ( e ij ). Note that
for any node i in an output graph, there can be at most one outgoing edge
marked A and many outgoing edges marked S . The goal of the algorithm is
to find an output graph representing an execution plan such that the sum of
the costs labeling the edges is minimum.
To obtain the minimum cost output graph, the algorithm proceeds level
by level, starting from level 0 until level N − 1, where N is the number of
levels in the search lattice. We find the best way of computing the nodes in
each level k from the nodes in level k +1, reducing the problem to a weighted
bipartite matching problem as follows. Consider a pair ( k,k + 1) of levels.
The algorithm first transforms the level k +1bymaking k copies of each
one of its nodes. Thus, each node in level k + 1 will have k + 1 children,
 
Search WWH ::




Custom Search