Databases Reference
In-Depth Information
(b) Modify your design in (a) to handle
incremental data updates.
Give the reasoning
behind your new design.
5.6
When computing a cube of high dimensionality, we encounter the inherent
curse of
dimensionality
problem: There exists a huge number of subsets of combinations of
dimensions.
(a) Suppose that there are only two base cells, f.
a
1
,
a
2
,
a
3
,
:::
,
a
100
/
and
.
a
1
,
a
2
,
,
b
100
/g, in a 100-D base cuboid. Compute the number of nonempty aggregate
cells. Comment on the storage space and time required to compute these cells.
(b) Suppose we are to compute an iceberg cube from (a). If the minimum support count
in the iceberg condition is 2, how many aggregate cells will there be in the iceberg
cube? Show the cells.
(c) Introducing iceberg cubes will lessen the burden of computing trivial aggregate cells
in a data cube. However, even with iceberg cubes, we could still end up having to
compute a large number of trivial uninteresting cells (i.e., with small counts). Sup-
pose that a database has 20 tuples that map to (or cover) the two following base
cells in a 100-D base cuboid, each with a cell count of 10: f.
b
3
,
:::
a
1
,
a
2
,
a
3
,
:::
,
a
100
/
: 10,
.
a
1
,
a
2
,
b
3
,
:::
,
b
100
/
: 10g.
i.
Let the minimum support be 10. How many distinct aggregate cells will
there be like the following:f.
a
1
,
a
2
,
a
3
,
a
4
,
:::
,
a
99
, /
: 10,
:::
,
.
a
1
,
a
2
, ,
a
4
,
:::
,
a
99
,
a
100
/
: 10, . . . ,
.
a
1
,
a
2
,
a
3
, ,
:::
, , /
: 10g?
ii.
If we ignore all the aggregate cells that can be obtained by replacing some con-
stants with 's while keeping the same measure value, how many distinct cells
remain? What are the cells?
5.7
Propose an algorithm that computes
closed iceberg cubes
efficiently.
5.8
Suppose that we want to compute an iceberg cube for the dimensions,
A
,
B
,
C
,
D
, where
we wish to materialize all cells that satisfy a minimum support count of at least
v
, and
where
cardinality(A)
cardinality(D)
. Show the
BUC
processing tree (which shows the order in which the BUC algorithm explores a data
cube's lattice, starting from
all
) for the construction of this iceberg cube.
5.9
Discuss how you might extend the
Star-Cubing
algorithm to compute iceberg cubes
where the iceberg condition tests for an
avg
that is no bigger than some value,
v
.
5.10
A flight data warehouse for a travel agent consists of six dimensions:
traveler, departure
(city), departure time, arrival, arrival time
, and
flight
; and two measures:
count()
and
avgfare()
, where
avgfare()
stores the concrete fare at the lowest level but the average fare
at other levels.
(a) Suppose the cube is fully materialized. Starting with the
base cuboid
[
traveler
,
depar-
ture
,
departure time
,
arrival
,
arrival time
,
flight
], what specific OLAP operations
(e.g., roll-up
flight
to
airline
) should one perform to list the average fare per month
for
each business traveler
who flies American Airlines (
AA
) from Los Angeles in 2009?
<
cardinality(B)
<
cardinality(C)
<