Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

on b 1 just as it did on b . This traversal and the resultant trees are shown in Figure 5.12.

The child trees ACD

AB are created again but now with the new values

from the b 1 subtree. For example, notice that the aggregate count of c in the ACD

=

A and ABD

=

A

tree has increased from 1 to 3. The trees that remained intact during the last traversal

are reused and the new aggregate values are added on. For instance, another branch is

added to the BCD tree.

Just like before, the algorithm will reach a leaf node at d and traverse back. This

time, it will reach a 1 and notice that there exists a sibling in a 2 . In this case, all child

trees except BCD in Figure 5.12 are destroyed. Afterward, the algorithm will perform

the same traversal on a 2 . BCD continues to grow while the other subtrees start fresh

with a 2 instead of a 1 .

=

A node must satisfy two conditions in order to generate child trees: (1) the measure

of the node must satisfy the iceberg condition; and (2) the tree to be generated must

include at least one non-star-node (i.e., nontrivial). This is because if all the nodes were

star-nodes, then none of them would satisfy min sup . Therefore, it would be a complete

waste to compute them. This pruning is observed in Figures 5.11 and 5.12. For example,

the left subtree extending from node a 1 in the base tree in Figure 5.11 does not include

any nonstar-nodes. Therefore, the a 1 CD

a 1 subtree should not have been generated. It

is shown, however, for illustration of the child tree generation process.

Star-Cubing is sensitive to the ordering of dimensions, as with other iceberg cube

construction algorithms. For best performance, the dimensions are processed in order

of decreasing cardinality. This leads to a better chance of early pruning, because the

higher the cardinality, the smaller the partitions, and therefore the higher possibility

that the partition will be pruned.

Star-Cubing can also be used for full cube computation. When computing the full

cube for a dense data set, Star-Cubing's performance is comparable with MultiWay and

is much faster than BUC. If the data set is sparse, Star-Cubing is significantly faster

than MultiWay and faster than BUC, in most cases. For iceberg cube computation, Star-

Cubing is faster than BUC, where the data are skewed and the speed-up factor increases

as min sup decreases.

=

5.2.4 Precomputing Shell Fragments for Fast

High-Dimensional OLAP

Recall the reason that we are interested in precomputing data cubes: Data cubes facil-

itate fast OLAP in a multidimensional data space. However, a full data cube of high

dimensionality needs massive storage space and unrealistic computation time. Iceberg

cubes provide a more feasible alternative, as we have seen, wherein the iceberg con-

dition is used to specify the computation of only a subset of the full cube's cells.

However, although an iceberg cube is smaller and requires less computation time than

its corresponding full cube, it is not an ultimate solution.

For one, the computation and storage of the iceberg cube can still be costly. For exam-

ple, if the base cuboid cell,

.

a 1 , a 2 ,

:::

, a 60 /

, passes minimum support (or the iceberg

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home