Physical Data Warehouse Design - Data Warehouse Systems: Design and Implementation - page 243

Database Reference

In-Depth Information

The algorithm includes cache-results and amortize-scans strategies by means

of computing nodes with common prefixes in a single scan. This is called

pipelined evaluation in database query optimization. In this way, we could

compute ABCD , ABC , AB ,and A in a single scan because the attribute order

in the view is the sorting order in the file. For example, in the base table below,

with a single scan of the first five tuples, we can compute the aggregations

( a1 , b1 , c1 , 200 ), ( a1 , b1 , c2 , 500 ), ( a1 , b1 , 700 ), ( a1 , b2 , 400 ), and ( a1 , 1100 ).

A B C D

a1 b1 c1 d1 100

a1 b1 c1 d2 100

a1 b1 c2 d1 200

a1 b1 c2 d1 300

a1 b2 c1 d1 400

a2 b1 c1 d1 100

a2 b1 c2 d2 400

··· ··· ··· ··· ···

The input of the algorithm is a data cube lattice in which each edge e ij ,

where node i is the parent of node j , is labeled with two costs, S ( e ij )and

A ( e ij ). S ( e ij ) is the cost of computing j from i if i is not sorted. A ( e ij )is

the cost of computing j from i if i is already sorted. Thus, A ( e ij ) ≤ S ( e ij ).

In addition, we consider the lattice organized into levels, where each level

k contains views with exactly k attributes, starting from All ,where k =0.

This data structure is called a search lattice .

The output of the algorithm is a subgraph of the search lattice such that

each node has exactly one parent from which it will be computed in a certain

mode, that is, sorted or not (note that in the search lattice, each node, except

All , has more than one parent). If the attribute order of a node j is a prefix of

the order of its parent i ,then j can be computed from i without sorting the

latter, and in the resulting graph, the edge will have cost A ( e ij ). Otherwise,

i has to be sorted to compute j and the edge will have cost S ( e ij ). Note that

for any node i in an output graph, there can be at most one outgoing edge

marked A and many outgoing edges marked S . The goal of the algorithm is

to find an output graph representing an execution plan such that the sum of

the costs labeling the edges is minimum.

To obtain the minimum cost output graph, the algorithm proceeds level

by level, starting from level 0 until level N − 1, where N is the number of

levels in the search lattice. We find the best way of computing the nodes in

each level k from the nodes in level k +1, reducing the problem to a weighted

bipartite matching problem as follows. Consider a pair ( k,k + 1) of levels.

The algorithm first transforms the level k +1bymaking k copies of each

one of its nodes. Thus, each node in level k + 1 will have k + 1 children,

Next Page

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home