Compression Schemes of High Dimensional Data for MOLAP - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

set of bitmaps is treated as a Boolean matrix in

Approximate Encoding technique. The matrix

is constructed in a compressed form called Ap-

proximate Bitmap where multiple hash functions

are used to encode into Approximate Bitmap to

compress the Boolean matrix. For each vector of

bits in the matrix, hashing string hs is constructed

as the function of a row number and a column

number in the matrix. The positions pointed by the

hash values are set to “1” in Approximate Bitmap.

The efficiency of bitmap compression ratio can

be improved by reordering columns in bitmap

matrices which is an NP-complete problem and

only ordering heuristics can be applied.

The authors of (Stabno and Wrembel, 2007)

show that the performance of Run Length

Huffman(RLH) is better than Word-Aligned

Hybrid or Byte-aligned Bitmap Compression be-

cause the Huffman tree is stored in main memory.

Although decompressing bitmaps compressed by

RLH requires a larger number of operations than

in the case of Word-Aligned Hybrid and Byte-

aligned Bitmap Compression, the fact that the

tree is stored in-memory compensates. One major

drawback of bitmap compression schemes is the

need to update the bitmap. To modify a bitmap

compressed by RLH, it is necessary to decompress

the whole bitmap, then to modify the frequencies,

and finally compress the bitmap again. All these

operations are necessary because when updates

occur the frequencies of distances change and

hence the run-length encoding also changes. In

order to improve the efficiency concerning bit-

map updating, a new approach of modified RLH

called RLH-1024 is also proposed by (Stabno and

Wrembel, 2007).

Kaser & Daniel (2003) employs block coded

data cubes for Hybrid OLAP (HOLAP) compres-

sion for two important operations of OLAP, namely

slice and normalization. The slice fixes one of the

attributes and the normalization can be viewed as

a tuple of permutations. They also showed that

depending on whether there is a very dense or

a very sparse block, MOLAP and ROLAP are

more efficient respectively. There is also work

on compression of ROLAP datasets (Dehne et,

al; 2001, Ng & Chinya 1997, and Sismanis et,

al; 2002).

trendS In MolAP coMPreSSIon

The state of the art of the trends in MOLAP com-

pression and implementation can be summarized

as follows:

•

Implementing the dense part of the

array

as MOLAP and sparse part as ROLAP.

This approach is termed as Hybrid OLAP

(HOLAP). The dense chunks are com-

pressed for MOLAP implementation and

sparse chunks are compressed for ROLAP

compression. The array is also considered

at multiple levels for higher dimensions.

Vendors such as Speedware and Microsoft

are applying this approach. P. Kaser &

Daniel (2003) show that for large multidi-

mensional arrays, proper normalization can

lead to more efficient storage in HOLAP

contexts that store dense and sparse chunks

differently. The uncompressed multidi-

mensional array may be sparse in some

areas and have some dense clusters of data

in others(Owen, 2002). If certain chunks of

the array are sufficiently dense then those

chunks are more efficient than the sparse

representation. The formation of dense

chunks is important and it allows better ar-

ray compression. If the normalization and

selection of chunk boundaries are done

carefully, then the data can be clustered.

But this is an NP-problem even for a sim-

ple two-dimensional case. This problem

is of the Maximum-Edge-Biclique form

(Owen, 2002). Rotem et, al; (2007) treat

each chunk as dense array. The block ad-

dressing is done in two levels: the first level

concerns locating the block that an element

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home