Databases Reference
In-Depth Information
with the apex cuboid at the bottom and the base cuboid at the top. In that view, BUC
does bottom-up construction. However, because we adopt the application worldview
where drill-down refers to drilling from the apex cuboid down toward the base cuboid,
the exploration process of BUC is regarded as top-down. BUC's exploration for the
computation of a 3-D data cube is shown in Figure 5.5.
The BUC algorithm is shown on the next page in Figure 5.6. We first give an expla-
nation of the algorithm and then follow up with an example. Initially, the algorithm is
called with the input relation (set of tuples). BUC aggregates the entire input (line 1)
and writes the resulting total (line 3). (Line 2 is an optimization feature that is discussed
later in our example.) For each dimension d (line 4), the input is partitioned on d (line
6). On return from Partition() , dataCount contains the total number of tuples for each
distinct value of dimension d . Each distinct value of d forms its own partition . Line 8
iterates through each partition. Line 10 tests the partition for minimum support. That
is, if the number of tuples in the partition satisfies (i.e., is ) the minimum support,
then the partition becomes the input relation for a recursive call made to BUC, which
computes the iceberg cube on the partitions for dimensions d C1 to numDims (line 12).
Note that for a full cube (i.e., where minimum support in the having clause is 1), the
minimum support condition is always satisfied. Thus, the recursive call descends one
level deeper into the lattice. On return from the recursive call, we continue with the next
partition for d . After all the partitions have been processed, the entire process is repeated
for each of the remaining dimensions.
Example 5.5 BUC construction of an iceberg cube. Consider the iceberg cube expressed in SQL as
follows:
computecube iceberg cube as
select A , B , C , D , count(*)
from R
cubeby A , B , C , D
havingcount(*)
>D 3
Let's see how BUC constructs the iceberg cube for the dimensions A , B , C , and D , where
3 is the minimum support count. Suppose that dimension A has four distinct values,
a 1 , a 2 , a 3 , a 4 ; B has four distinct values, b 1 , b 2 , b 3 , b 4 ; C has two distinct values, c 1 , c 2 ;
and D has two distinct values, d 1 , d 2 . If we consider each group-by to be a partition ,
then we must compute every combination of the grouping attributes that satisfy the
minimum support (i.e., that have three tuples).
Figure 5.7 illustrates how the input is partitioned first according to the different attri-
bute values of dimension A , and then B , C , and D . To do so, BUC scans the input,
aggregating the tuples to obtain a count for all , corresponding to the cell
.
Dimension A is used to split the input into four partitions, one for each distinct value of
A . The number of tuples (counts) for each distinct value of A is recorded in dataCount .
BUC uses the Apriori property to save time while searching for tuples that satisfy
the iceberg condition. Starting with A dimension value, a 1 , the a 1 partition is aggre-
gated, creating one tuple for the A group-by, corresponding to the cell
., , , /
.
a 1 , , , /
.
 
Search WWH ::




Custom Search