Compression Schemes of High Dimensional Data for MOLAP - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Figure 7. Realization of 2 dimensional extendible array

Compressing by History

Offset Compression

history offset compression, even if the dimension is

high, the size of the reference is fixed in short.

As the data incrementally grows over time,

the extension of the array should be a character-

istic of MOLAP systems. (Rotem & Zhao, 1996)

pointed out some reasons for extension, such as

to add new values to a dimension, a new level

of aggregation or a completely new dimension.

History offset compression (Hasan et, al; 2006,

and 2007) allows easy extension, since it is based

on extendible arrays. This allows the array to be

extended dynamically without reallocating the

existing data that is already stored. The degree

of compression of the history offset compression

approach is heavily dependent on the number of

dimensions and the length of each dimension,

because the size of each subarray is determined

Using coordinate method, Each element of an n

dimensional extendible array can be specified by

its n dimensional coordinate like <x 1 , x 2 , ..., x n >.

In this technique, an element is specified using

the pair of history value and offset value of the

extendible array. Since a history value is unique

and has one to one correspondence with the cor-

responding subarray, the subarray including the

specified element of an extendible array can be

referred to uniquely by its corresponding history

value h. Moreover, the offset value (i.e., logical

location) of the element in the subarray can be

computed by using the addressing function and

this is also unique in the subarray. Therefore, each

element of an n -dimensional extendible array

can be referenced by specifying the pair (history

value, offset value).

In the coordinate method, if the dimension of

the extendible array becomes higher, the length

of the coordinate becomes longer proportionally.

Since an n -column record can be referenced by

its n dimensional coordinate <x 1 , x 2, ...,x n > in the

corresponding multidimensional array, the storage

requirement for referencing records become large

if the dimension is high. On the contrary, in the

-

Õ 1

n

1

d i

by

, where n is the number of dimensions.

If n and di i are large, then the size of the subarray

overflows the address space even for 64 bit ma-

chines. Moreover, for a k-bit processor, if b bits

are used for storing history values and rest of the

k-b bits are used to store offset values in history

offset compression, then the maximum history

value is 2 b and the maximum offset value that

can be stored is 2 k-b . But these are small numbers

with respect to large data warehouses. Unless

i

=

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home