Compression Schemes of High Dimensional Data for MOLAP - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

some special treatment is done, the array size will

overflow soon, even if the k bits are used both for

history and offset values. For a query of the form

“ column name<value or column name>value or

column name between value1 and value2”, it is

necessary to visit different subarrays in the extend-

ible array, whereas in traditional multidimensional

arrays the records are organized linearly in only

one stream, hence the query time of traditional

arrays outperforms that of the extendible array.

But when the compression scheme is applied,

then the retrieval time is very similar for both the

cases (Hasan et, al; 2007), while there is a gain

concerning extension capability when history

offset compression is used.

small result sets; On the other hand, small chunk

sizes may require more disk accesses to retrieve

all chunks required to answer a query. Moreover,

the chunk shape influences the number of chunks

retrieved in answering a query. It is shown by

Zhao et, al; 1997 that the optimal dimension

order for chunking multidimensional arrays is

dd d

£££...

d n

here d i denotes the size

of the dimension.

In order to have a good compression rate for

data cubes, image compression approaches can

also be applied, where simple blocks are com-

pressed using many image compression formats

including JPEG in David et, al; (1998). The im-

ages are divided into arbitrarily shaped regions

through adaptive (Pennec & Mallat, 2000) or

non-adaptive algorithms (Emmanuel & David,

1999). It is not clear a priori that more complex

shapes or algorithms lead to more efficient storage.

Another argument for block-coded data cubes is

that many efficient buffering schemes for OLAP

range queries rely themselves on block coding

(Geffner et, al; 1999 and Lemire, 2002).

Bitmap indices can be used for two important

purposes-, namely basic bitmap index and bit-

map compression or bitmap index compression

(Stabno & Wrembel, 2007). Beside the basic

bit map compression, there are also ideas of

bitmap compression like Byte-aligned Bitmap

Compression (Antoshenkov & Ziauddin 1996),

Word-Aligned Hybrid (Wu et. al; 2004) (Stock-

inger & Wu 2007), and Approximate Encoding

(Apaydin et, al; 2006). The first two techniques

use the basic RLE compression (discussed in this

chapter) algorithm where continuous vectors of

bits with the same bit value (either “0” or “1”) are

represented as one occurrence of the value and

the number of the values. Byte-aligned Bitmap

Compression divides a bitmap into 8-bit words,

whereas Word-Aligned Hybrid divides a bitmap

into 31-bit words. Both techniques offer the best

compression ratio for bitmaps describing rows

ordered by the value of an indexed attribute. The

AddItIonAl relAted Work

And PerforMAnce ISSueS

In this section we review briefly other related

work on the issue of MOLAP compression and

performance.

The performance of accessing elements of large

multidimensional arrays is strongly affected by the

order of the dimensions, since the array is stored

by a predetermined order of dimensions or axes.

The multidimensional array having a number of

dimensions n can be accessed or linearized in n!

ways (Rotem & Zhao, 1996). Hence, the perfor-

mance of aggregation computation for a slice of

an array might greatly differ depending on which

slice operation is done (Shimada et, al; 2006 and

Seamons & Marianne, 1994).

To cope with the dimension dependency, array

chunking can be employed (Zhao et, al; 1997) to

get uniform treatment of all the dimensions. An

n-dimensional array is divided into small sized

n-dimensional chunks and each chunk is stored

as a contiguous page on the secondary storage.

The dimension dependency becomes moderate

by chunking, but chunk size and shape have great

effects on array chunking: large chunk sizes may

cause unnecessary data to be read for queries with

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home