Physical Data Warehouse Design - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

Fix the sort order of i as the order of the level k node

connected to i by an A () edge;

Generate-Plan( k +1

→ k );

Create k additional copies of each level k +1 node;

Connect each copy node to the same set of level k nodes as the original node;

Assign cost A ( e ij ) to edges e ij from the original nodes and cost S ( e ij )

to edges from the copy nodes;

Find the minimum cost matching on the transformed level k +1 with level k ;

Figure 7.8 shows an evaluation plan for computing the cube lattice of

Fig. 7.6 using the PipeSort algorithm. The minimum cost sort plan will first

sort the base fact table in CBAD order and compute CBA , CB , C ,and All

aggregations in a pipelined fashion. Then, we sort the base fact table in the

BADC order and proceed as above to compute aggregates BAD , BA ,and A .

We continue in the same way with ACDB and DBCA .Notehowtheviewsin

level 1 ( A , B , C ,and D ) are computed from the views in level 2 in the way

that was indicated by the bipartite graph matching in Fig. 7.7 .

All

C

B

A

D

CB

BA

AC

DB

AD

CD

CBA

BAD

ACD

DBC

CBAD

Fig. 7.8 Evaluation plan for computing the cube lattice in Fig. 7.6

7.4.2 Cube Size Estimation

We have already said that algorithms like PipeSort, and most algorithms

computing summary tables, require knowing the size of each aggregate.

However, in general this is not known in advance. Thus, we need to accurately

predict the sizes of the different aggregates. There are three classic methods

for this, although a wide array of statistical techniques could be used. The

first of these methods is purely analytical, the second is based on sampling,

and the last one on probabilistic counting.

The analytical algorithm is based on a result by Feller from 1957, stating

that choosing r elements (which we can assume are the tuples in a relation)

randomly from a set of n elements (which are all the different values a set

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home