Databases Reference
In-Depth Information
5.16
Multifeature cubes
allow us to construct interesting data cubes based on rather sophisti-
cated query conditions. Can you construct the following multifeature cube by trans-
lating the following user requests into queries using the form introduced in this
textbook?
(a) Construct a smart shopper cube where a shopper is smart if at least 10% of the goods
she buys in each shopping trip are on sale.
(b) Construct a data cube for best-deal products where best-deal products are those
products for which the price is the lowest for this product in the given month.
5.17
Discovery-driven cube exploration
is a desirable way to mark interesting points among
a large number of cells in a data cube. Individual users may have different views on
whether a point should be considered interesting enough to be marked. Suppose one
would like to mark those objects of which the absolute value of
z
score is over 2 in every
row and column in a
d
-dimensional plane.
(a) Derive an efficient computation method to identify such points during the data cube
computation.
(b) Suppose
-
dimensional cuboids materialized but not the
d
-dimensional one. Derive an efficient
method to mark those
a
partially
materialized
cube
has
.
d
1
/
-dimensional
and
.
d
C1
/
.
d
1
/
-dimensional cells with
d
-dimensional children that
contain such marked points.
5.7
Bibliographic Notes
Efficient computation of multidimensional aggregates in data cubes has been studied
by many researchers. Gray, Chaudhuri, Bosworth, et al. [GCB
C
97] proposed
cube-by
as
a relational aggregation operator generalizing group-by, crosstabs, and subtotals, and
categorized data cube measures into three categories:
distributive
,
algebraic
, and
holis-
tic
. Harinarayan, Rajaraman, and Ullman [HRU96] proposed a greedy algorithm for
the partial materialization of cuboids in the computation of a data cube. Sarawagi and
Stonebraker [SS94] developed a chunk-based computation technique for the efficient
organization of large multidimensional arrays. Agarwal, Agrawal, Deshpande, et al.
[AAD
C
96] proposed several guidelines for efficient computation of multidimensional
aggregates for ROLAP servers.
The chunk-based MultiWay array aggregation method for data cube computation in
MOLAP was proposed in Zhao, Deshpande, and Naughton [ZDN97]. Ross and Srivas-
tava [RS97] developed a method for computing sparse data cubes. Iceberg queries are
first described in Fang, Shivakumar, Garcia-Molina, et al. [FSGM
C
98]. BUC, a scalable
method that computes iceberg cubes from the apex cuboid downwards, was introduced
by Beyer and Ramakrishnan [BR99]. Han, Pei, Dong, and Wang [HPDW01] introduced
an H-Cubing method for computing iceberg cubes with complex measures using an
H-tree structure.
The Star-Cubing method for computing iceberg cubes with a dynamic star-tree struc-
ture was introduced by Xin, Han, Li, and Wah [XHLW03]. MM-Cubing, an efficient