Database Reference
In-Depth Information
Fix the sort order of
i
as the order of the level
k
node
connected to
i
by an
A
()
edge;
Generate-Plan(
k
+1
→ k
);
Create
k
additional copies of each level
k
+1
node;
Connect each copy node to the same set of level
k
nodes as the original node;
Assign cost
A
(
e
ij
)
to edges
e
ij
from the original nodes and cost
S
(
e
ij
)
to edges from the copy nodes;
Find the minimum cost matching on the transformed level
k
+1
with level
k
;
Figure
7.8
shows an evaluation plan for computing the cube lattice of
Fig.
7.6
using the PipeSort algorithm. The minimum cost sort plan will first
sort the base fact table in
CBAD
order and compute
CBA
,
CB
,
C
,and
All
aggregations in a pipelined fashion. Then, we sort the base fact table in the
BADC
order and proceed as above to compute aggregates
BAD
,
BA
,and
A
.
We continue in the same way with
ACDB
and
DBCA
.Notehowtheviewsin
level 1 (
A
,
B
,
C
,and
D
) are computed from the views in level 2 in the way
that was indicated by the bipartite graph matching in Fig.
7.7
.
All
C
B
A
D
CB
BA
AC
DB
AD
CD
CBA
BAD
ACD
DBC
CBAD
Fig. 7.8
Evaluation plan for computing the cube lattice in Fig.
7.6
7.4.2 Cube Size Estimation
We have already said that algorithms like PipeSort, and most algorithms
computing summary tables, require knowing the size of each aggregate.
However, in general this is not known in advance. Thus, we need to accurately
predict the sizes of the different aggregates. There are three classic methods
for this, although a wide array of statistical techniques could be used. The
first of these methods is purely analytical, the second is based on sampling,
and the last one on probabilistic counting.
The
analytical algorithm
is based on a result by Feller from 1957, stating
that choosing
r
elements (which we can assume are the tuples in a relation)
randomly from a set of
n
elements (which are all the different values a set
Search WWH ::
Custom Search