Databases Reference
In-Depth Information
on b 1 just as it did on b . This traversal and the resultant trees are shown in Figure 5.12.
The child trees ACD
AB are created again but now with the new values
from the b 1 subtree. For example, notice that the aggregate count of c in the ACD
=
A and ABD
=
A
tree has increased from 1 to 3. The trees that remained intact during the last traversal
are reused and the new aggregate values are added on. For instance, another branch is
added to the BCD tree.
Just like before, the algorithm will reach a leaf node at d and traverse back. This
time, it will reach a 1 and notice that there exists a sibling in a 2 . In this case, all child
trees except BCD in Figure 5.12 are destroyed. Afterward, the algorithm will perform
the same traversal on a 2 . BCD continues to grow while the other subtrees start fresh
with a 2 instead of a 1 .
=
A node must satisfy two conditions in order to generate child trees: (1) the measure
of the node must satisfy the iceberg condition; and (2) the tree to be generated must
include at least one non-star-node (i.e., nontrivial). This is because if all the nodes were
star-nodes, then none of them would satisfy min sup . Therefore, it would be a complete
waste to compute them. This pruning is observed in Figures 5.11 and 5.12. For example,
the left subtree extending from node a 1 in the base tree in Figure 5.11 does not include
any nonstar-nodes. Therefore, the a 1 CD
a 1 subtree should not have been generated. It
is shown, however, for illustration of the child tree generation process.
Star-Cubing is sensitive to the ordering of dimensions, as with other iceberg cube
construction algorithms. For best performance, the dimensions are processed in order
of decreasing cardinality. This leads to a better chance of early pruning, because the
higher the cardinality, the smaller the partitions, and therefore the higher possibility
that the partition will be pruned.
Star-Cubing can also be used for full cube computation. When computing the full
cube for a dense data set, Star-Cubing's performance is comparable with MultiWay and
is much faster than BUC. If the data set is sparse, Star-Cubing is significantly faster
than MultiWay and faster than BUC, in most cases. For iceberg cube computation, Star-
Cubing is faster than BUC, where the data are skewed and the speed-up factor increases
as min sup decreases.
=
5.2.4 Precomputing Shell Fragments for Fast
High-Dimensional OLAP
Recall the reason that we are interested in precomputing data cubes: Data cubes facil-
itate fast OLAP in a multidimensional data space. However, a full data cube of high
dimensionality needs massive storage space and unrealistic computation time. Iceberg
cubes provide a more feasible alternative, as we have seen, wherein the iceberg con-
dition is used to specify the computation of only a subset of the full cube's cells.
However, although an iceberg cube is smaller and requires less computation time than
its corresponding full cube, it is not an ultimate solution.
For one, the computation and storage of the iceberg cube can still be costly. For exam-
ple, if the base cuboid cell,
.
a 1 , a 2 ,
:::
, a 60 /
, passes minimum support (or the iceberg
 
Search WWH ::




Custom Search