Database Reference
In-Depth Information
Fix the sort order of i as the order of the level k node
connected to i by an A () edge;
Generate-Plan( k +1
→ k );
Create k additional copies of each level k +1 node;
Connect each copy node to the same set of level k nodes as the original node;
Assign cost A ( e ij ) to edges e ij from the original nodes and cost S ( e ij )
to edges from the copy nodes;
Find the minimum cost matching on the transformed level k +1 with level k ;
Figure 7.8 shows an evaluation plan for computing the cube lattice of
Fig. 7.6 using the PipeSort algorithm. The minimum cost sort plan will first
sort the base fact table in CBAD order and compute CBA , CB , C ,and All
aggregations in a pipelined fashion. Then, we sort the base fact table in the
BADC order and proceed as above to compute aggregates BAD , BA ,and A .
We continue in the same way with ACDB and DBCA .Notehowtheviewsin
level 1 ( A , B , C ,and D ) are computed from the views in level 2 in the way
that was indicated by the bipartite graph matching in Fig. 7.7 .
All
C
B
A
D
CB
BA
AC
DB
AD
CD
CBA
BAD
ACD
DBC
CBAD
Fig. 7.8 Evaluation plan for computing the cube lattice in Fig. 7.6
7.4.2 Cube Size Estimation
We have already said that algorithms like PipeSort, and most algorithms
computing summary tables, require knowing the size of each aggregate.
However, in general this is not known in advance. Thus, we need to accurately
predict the sizes of the different aggregates. There are three classic methods
for this, although a wide array of statistical techniques could be used. The
first of these methods is purely analytical, the second is based on sampling,
and the last one on probabilistic counting.
The analytical algorithm is based on a result by Feller from 1957, stating
that choosing r elements (which we can assume are the tuples in a relation)
randomly from a set of n elements (which are all the different values a set
 
Search WWH ::




Custom Search