Databases Reference
In-Depth Information
(b) Modify your design in (a) to handle incremental data updates. Give the reasoning
behind your new design.
5.6 When computing a cube of high dimensionality, we encounter the inherent curse of
dimensionality problem: There exists a huge number of subsets of combinations of
dimensions.
(a) Suppose that there are only two base cells, f.
a 1 , a 2 , a 3 ,
:::
, a 100 /
and
.
a 1 , a 2 ,
, b 100 /g, in a 100-D base cuboid. Compute the number of nonempty aggregate
cells. Comment on the storage space and time required to compute these cells.
(b) Suppose we are to compute an iceberg cube from (a). If the minimum support count
in the iceberg condition is 2, how many aggregate cells will there be in the iceberg
cube? Show the cells.
(c) Introducing iceberg cubes will lessen the burden of computing trivial aggregate cells
in a data cube. However, even with iceberg cubes, we could still end up having to
compute a large number of trivial uninteresting cells (i.e., with small counts). Sup-
pose that a database has 20 tuples that map to (or cover) the two following base
cells in a 100-D base cuboid, each with a cell count of 10: f.
b 3 ,
:::
a 1 , a 2 , a 3 ,
:::
, a 100 /
: 10,
.
a 1 , a 2 , b 3 ,
:::
, b 100 /
: 10g.
i.
Let the minimum support be 10. How many distinct aggregate cells will
there be like the following:f.
a 1 , a 2 , a 3 , a 4 ,
:::
, a 99 , /
: 10,
:::
,
.
a 1 , a 2 , , a 4 ,
:::
,
a 99 , a 100 /
: 10, . . . ,
.
a 1 , a 2 , a 3 , ,
:::
, , /
: 10g?
ii.
If we ignore all the aggregate cells that can be obtained by replacing some con-
stants with 's while keeping the same measure value, how many distinct cells
remain? What are the cells?
5.7 Propose an algorithm that computes closed iceberg cubes efficiently.
5.8 Suppose that we want to compute an iceberg cube for the dimensions, A , B , C , D , where
we wish to materialize all cells that satisfy a minimum support count of at least v , and
where cardinality(A)
cardinality(D) . Show the BUC
processing tree (which shows the order in which the BUC algorithm explores a data
cube's lattice, starting from all ) for the construction of this iceberg cube.
5.9 Discuss how you might extend the Star-Cubing algorithm to compute iceberg cubes
where the iceberg condition tests for an avg that is no bigger than some value, v .
5.10 A flight data warehouse for a travel agent consists of six dimensions: traveler, departure
(city), departure time, arrival, arrival time , and flight ; and two measures: count() and
avgfare() , where avgfare() stores the concrete fare at the lowest level but the average fare
at other levels.
(a) Suppose the cube is fully materialized. Starting with the base cuboid [ traveler , depar-
ture , departure time , arrival , arrival time , flight ], what specific OLAP operations
(e.g., roll-up flight to airline ) should one perform to list the average fare per month
for each business traveler who flies American Airlines ( AA ) from Los Angeles in 2009?
<
cardinality(B)
<
cardinality(C)
<
 
Search WWH ::




Custom Search