Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

for tuple count() , or by fetching itemcount() and sum() from the ID measure array to

compute average() ) for the final set of cells.

Example 5.12 Point query. Suppose a user wants to compute the point queryh a 2 , b 1 , c 1 , d 1 ,: count() ? i

for our database in Table 5.4 and that the shell fragments for the partitions ( A , B , C )

and ( D , E ) are precomputed as described in Example 5.10. The query is broken down

into two subqueries based on the precomputed fragments: h a 2 , b 1 , c 1 , , i and h, ,

, d 1 , i. The best-fit precomputed shell fragments for the two subqueries are ABC and

D . The fetch of the TID lists for the two subqueries returns two lists: f4, 5g and f1, 3,

4, 5g. Their intersection is the list f4, 5g, which is of size 2. Thus, the final answer is

count() D 2 .

“How can we use shell fragments to answer a subcube query?” A subcube query returns

a local data cube based on the instantiated and inquired dimensions. Such a data cube

needs to be aggregated in a multidimensional way so that online analytical processing

(drilling, dicing, pivoting, etc.) can be made available to users for flexible manipulation

and analysis. Because instantiated dimensions usually provide highly selective constants

that dramatically reduce the size of the valid TID lists, we should make maximal use of

the precomputed shell fragments by finding the fragments that best fit the set of instan-

tiated dimensions, and fetching and intersecting the associated TID lists to derive the

reduced TID list. This list can then be used to intersect the best-fitting shell fragments

consisting of the inquired dimensions. This will generate the relevant and inquired base

cuboid, which can then be used to compute the relevant subcube on-the-fly using an

efficient online cubing algorithm.

Let the subcube query be of the form h i ,

j , and

p represent a set of instantiated values of dimension A i , A j , and A p , respectively, and A k

and A q represent two inquired dimensions. First, we check the shell fragment schema

to determine which dimensions among (1) A i , A j , and A p , and (2) A k and A q are in

the same fragment partition. Suppose A i and A j belong to the same fragment, as do A k

and A q , but that A p is in a different fragment. We fetch the corresponding TID lists in

the precomputed 2-D fragment for A i and A j using the instantiations

j , A k ? ,

p , A q ? : M ? i, where

i ,

i and

j , then

fetch the TID list on the precomputed 1-D fragment for A p using instantiation

p , and

then fetch the TID lists on the precomputed 2-D fragments for A k and A q , respectively,

using no instantiations (i.e., all possible values). The obtained TID lists are intersected

to derive the final TID lists, which are used to fetch the corresponding measures from

the ID measure array to derive the “base cuboid” of a 2-D subcube for two dimensions

. A fast cube computation algorithm can be applied to compute this 2-D cube

based on the derived base cuboid. The computed 2-D cube is then ready for OLAP

operations.

A k , A q /

Example 5.13 Subcube query. Suppose that a user wants to compute the subcube query, h a 2 , b 1 , ? ,

, ? : count

? i, for our database shown earlier in Table 5.4, and that the shell fragments

have been precomputed as described in Example 5.10. The query can be broken into

three best-fit fragments according to the instantiated and inquired dimensions: AB , C ,

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home