Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

5.16 Multifeature cubes allow us to construct interesting data cubes based on rather sophisti-

cated query conditions. Can you construct the following multifeature cube by trans-

lating the following user requests into queries using the form introduced in this

textbook?

(a) Construct a smart shopper cube where a shopper is smart if at least 10% of the goods

she buys in each shopping trip are on sale.

(b) Construct a data cube for best-deal products where best-deal products are those

products for which the price is the lowest for this product in the given month.

5.17 Discovery-driven cube exploration is a desirable way to mark interesting points among

a large number of cells in a data cube. Individual users may have different views on

whether a point should be considered interesting enough to be marked. Suppose one

would like to mark those objects of which the absolute value of z score is over 2 in every

row and column in a d -dimensional plane.

(a) Derive an efficient computation method to identify such points during the data cube

computation.

(b) Suppose

-

dimensional cuboids materialized but not the d -dimensional one. Derive an efficient

method to mark those

a

partially

materialized

cube

has

.

d 1

/

-dimensional

and

.

d C1

/

.

d 1

/

-dimensional cells with d -dimensional children that

contain such marked points.

5.7 Bibliographic Notes

Efficient computation of multidimensional aggregates in data cubes has been studied

by many researchers. Gray, Chaudhuri, Bosworth, et al. [GCB C 97] proposed cube-by as

a relational aggregation operator generalizing group-by, crosstabs, and subtotals, and

categorized data cube measures into three categories: distributive , algebraic , and holis-

tic . Harinarayan, Rajaraman, and Ullman [HRU96] proposed a greedy algorithm for

the partial materialization of cuboids in the computation of a data cube. Sarawagi and

Stonebraker [SS94] developed a chunk-based computation technique for the efficient

organization of large multidimensional arrays. Agarwal, Agrawal, Deshpande, et al.

[AAD C 96] proposed several guidelines for efficient computation of multidimensional

aggregates for ROLAP servers.

The chunk-based MultiWay array aggregation method for data cube computation in

MOLAP was proposed in Zhao, Deshpande, and Naughton [ZDN97]. Ross and Srivas-

tava [RS97] developed a method for computing sparse data cubes. Iceberg queries are

first described in Fang, Shivakumar, Garcia-Molina, et al. [FSGM C 98]. BUC, a scalable

method that computes iceberg cubes from the apex cuboid downwards, was introduced

by Beyer and Ramakrishnan [BR99]. Han, Pei, Dong, and Wang [HPDW01] introduced

an H-Cubing method for computing iceberg cubes with complex measures using an

H-tree structure.

The Star-Cubing method for computing iceberg cubes with a dynamic star-tree struc-

ture was introduced by Xin, Han, Li, and Wah [XHLW03]. MM-Cubing, an efficient

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home