Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

(b) Modify your design in (a) to handle incremental data updates. Give the reasoning

behind your new design.

5.6 When computing a cube of high dimensionality, we encounter the inherent curse of

dimensionality problem: There exists a huge number of subsets of combinations of

dimensions.

(a) Suppose that there are only two base cells, f.

a 1 , a 2 , a 3 ,

:::

, a 100 /

and

a 1 , a 2 ,

, b 100 /g, in a 100-D base cuboid. Compute the number of nonempty aggregate

cells. Comment on the storage space and time required to compute these cells.

(b) Suppose we are to compute an iceberg cube from (a). If the minimum support count

in the iceberg condition is 2, how many aggregate cells will there be in the iceberg

cube? Show the cells.

in a data cube. However, even with iceberg cubes, we could still end up having to

compute a large number of trivial uninteresting cells (i.e., with small counts). Sup-

pose that a database has 20 tuples that map to (or cover) the two following base

cells in a 100-D base cuboid, each with a cell count of 10: f.

b 3 ,

:::

a 1 , a 2 , a 3 ,

:::

, a 100 /

: 10,

a 1 , a 2 , b 3 ,

:::

, b 100 /

: 10g.

Let the minimum support be 10. How many distinct aggregate cells will

there be like the following:f.

a 1 , a 2 , a 3 , a 4 ,

:::

, a 99 , /

: 10,

:::

a 1 , a 2 , , a 4 ,

:::

a 99 , a 100 /

: 10, . . . ,

a 1 , a 2 , a 3 , ,

:::

, , /

: 10g?

ii.

If we ignore all the aggregate cells that can be obtained by replacing some con-

stants with 's while keeping the same measure value, how many distinct cells

remain? What are the cells?

5.7 Propose an algorithm that computes closed iceberg cubes efficiently.

5.8 Suppose that we want to compute an iceberg cube for the dimensions, A , B , C , D , where

we wish to materialize all cells that satisfy a minimum support count of at least v , and

where cardinality(A)

cardinality(D) . Show the BUC

processing tree (which shows the order in which the BUC algorithm explores a data

cube's lattice, starting from all ) for the construction of this iceberg cube.

5.9 Discuss how you might extend the Star-Cubing algorithm to compute iceberg cubes

where the iceberg condition tests for an avg that is no bigger than some value, v .

5.10 A flight data warehouse for a travel agent consists of six dimensions: traveler, departure

(city), departure time, arrival, arrival time , and flight ; and two measures: count() and

avgfare() , where avgfare() stores the concrete fare at the lowest level but the average fare

at other levels.

(a) Suppose the cube is fully materialized. Starting with the base cuboid [ traveler , depar-

ture , departure time , arrival , arrival time , flight ], what specific OLAP operations

(e.g., roll-up flight to airline ) should one perform to list the average fare per month

for each business traveler who flies American Airlines ( AA ) from Los Angeles in 2009?

cardinality(B)

cardinality(C)

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home