Database Reference
In-Depth Information
set of bitmaps is treated as a Boolean matrix in
Approximate Encoding technique. The matrix
is constructed in a compressed form called Ap-
proximate Bitmap where multiple hash functions
are used to encode into Approximate Bitmap to
compress the Boolean matrix. For each vector of
bits in the matrix, hashing string hs is constructed
as the function of a row number and a column
number in the matrix. The positions pointed by the
hash values are set to “1” in Approximate Bitmap.
The efficiency of bitmap compression ratio can
be improved by reordering columns in bitmap
matrices which is an NP-complete problem and
only ordering heuristics can be applied.
The authors of (Stabno and Wrembel, 2007)
show that the performance of Run Length
Huffman(RLH) is better than Word-Aligned
Hybrid or Byte-aligned Bitmap Compression be-
cause the Huffman tree is stored in main memory.
Although decompressing bitmaps compressed by
RLH requires a larger number of operations than
in the case of Word-Aligned Hybrid and Byte-
aligned Bitmap Compression, the fact that the
tree is stored in-memory compensates. One major
drawback of bitmap compression schemes is the
need to update the bitmap. To modify a bitmap
compressed by RLH, it is necessary to decompress
the whole bitmap, then to modify the frequencies,
and finally compress the bitmap again. All these
operations are necessary because when updates
occur the frequencies of distances change and
hence the run-length encoding also changes. In
order to improve the efficiency concerning bit-
map updating, a new approach of modified RLH
called RLH-1024 is also proposed by (Stabno and
Wrembel, 2007).
Kaser & Daniel (2003) employs block coded
data cubes for Hybrid OLAP (HOLAP) compres-
sion for two important operations of OLAP, namely
slice and normalization. The slice fixes one of the
attributes and the normalization can be viewed as
a tuple of permutations. They also showed that
depending on whether there is a very dense or
a very sparse block, MOLAP and ROLAP are
more efficient respectively. There is also work
on compression of ROLAP datasets (Dehne et,
al; 2001, Ng & Chinya 1997, and Sismanis et,
al; 2002).
trendS In MolAP coMPreSSIon
The state of the art of the trends in MOLAP com-
pression and implementation can be summarized
as follows:
Implementing the dense part of the
array
as MOLAP and sparse part as ROLAP.
This approach is termed as Hybrid OLAP
(HOLAP). The dense chunks are com-
pressed for MOLAP implementation and
sparse chunks are compressed for ROLAP
compression. The array is also considered
at multiple levels for higher dimensions.
Vendors such as Speedware and Microsoft
are applying this approach. P. Kaser &
Daniel (2003) show that for large multidi-
mensional arrays, proper normalization can
lead to more efficient storage in HOLAP
contexts that store dense and sparse chunks
differently. The uncompressed multidi-
mensional array may be sparse in some
areas and have some dense clusters of data
in others(Owen, 2002). If certain chunks of
the array are sufficiently dense then those
chunks are more efficient than the sparse
representation. The formation of dense
chunks is important and it allows better ar-
ray compression. If the normalization and
selection of chunk boundaries are done
carefully, then the data can be clustered.
But this is an NP-problem even for a sim-
ple two-dimensional case. This problem
is of the Maximum-Edge-Biclique form
(Owen, 2002). Rotem et, al; (2007) treat
each chunk as dense array. The block ad-
dressing is done in two levels: the first level
concerns locating the block that an element
Search WWH ::




Custom Search