Compression Schemes of High Dimensional Data for MOLAP - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

There are two types of compression, namely

lossless and lossy compression. In lossless data

compression, the decompressed data is an exact

replica of the original data. On the other hand, in

lossy data compression, the decompressed data

may be different from the original data. Typically,

there is some distortion between the original and

reproduced data. Data compression must be loss-

less for typical MOLAP applications.

In this chapter we review MOLAP compres-

sion schemes, discuss important issues related to

compression of MOLAP and existing techniques

and also discuss future trends. The chapter is struc-

tured as follows: section 2 describes compression

mechanisms of many existing MOLAP compres-

sion schemes. Section 3 reviews other related work

in MOLAP compression. Section 4 discusses some

relevant quality issues in MOLAP compression

and existing compression schemes and section 5

discusses future trends. Section 6 discusses some

limitations of compression schemes and section

7 concludes the chapter.

coding, bitmap compression and finally history

offset compression.

The compression techniques usually provide

two mappings. One is forward mapping, comput-

ing the location in the compressed dataset given

a position in the original dataset. The other one

is backward mapping , computing the position

in the original dataset given a location in the

compressed dataset. A compression method is

called mapping-complete if it provides forward

mapping and backward mapping. The term logical

database and physical database is used to refer

to the uncompressed and compressed database

respectively.

Some mapping complete compression schemes

such as header compression, BAP compression,

run length encoding, and bit map compression

first transform the multidimensional data into a

linearized array using the array linearized func-

tion. Then the linearized data are compressed by

a mapping complete compression method. Li and

Srivastava (2002) applied this idea for implement-

ing compressed MOLAP using header compres-

sion method. Hence those mapping complete

compression schemes are used for compressing

higher dimensional data sets after linearizing the

data using the array linearization function.

coMPreSSIon ScheMeS for

MultIdIMenSIonAl ArrAyS

Efficiently computing aggregations on com-

pressed data warehouses is crucial once the large

multidimensional databases are to be compressed

for storage and efficiency reasons. This com-

pression must be lossless for data warehousing

applications, in order to allow the original data

to be fully recovered from its compressed form.

In this section we discuss several compression

schemes that are applied to MOLAP. We start by

discussing multidimensional array linearization,

which may be used as part of many compression

schemes. After that we review a set of compres-

sion techniques that includes chunk-offset com-

pression, compressed row or column storage,

extended Karnaugh map representation, header

compression, BAP compression, run-length en-

Multidimensional Array linearization

Figure 1 is an example of mapping a relational table

to multidimensional array. In Traditional Multi-

dimensional Array (TMA) based implementation

of a MOLAP scheme, each of the kth column of

an n column relational table is mapped to the kth

dimension of the multidimensional array. Each

column value is mapped to a unique subscript

and the measure value (i.e. sales value) of the

relational table is inserted into the corresponding

cell in the multidimensional array. Therefore, each

record of the relation can be expressed as one cell

in the multidimensional array, if each column of

the relation is assigned to each dimension of the

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home