Database Reference
In-Depth Information
multiple sort orders to maximize query performance, rather than to minimize
storage space. The authors propose an architecture that allows for direct op-
eration on compressed data while minimizing the complexity of adding new
compression algorithms. They state: “Compression in traditional databases is
known to improve performance significantly. It reduces the size of the data
and improves I/O performance by reducing seek times (the data are stored
nearer to each other), reducing transfer times (there is less data to transfer),
and increasing buffer hit rate (a larger fraction of the DBMS fits in the buffer
pool). For queries that are I/O limited, the CPU overhead of decompression
is often compensated for by the I/O improvements” 18 (p. 671).
Compression techniques for row-wise stored data often employ dictionary
schemes to code attribute values in fewer bits. Sometimes Huffman encoding
is used, whereby varying symbol frequencies may be exploited to gain a more
compact total encoding, at the price of having to use varying length codes.
Also, the idea of frame of reference encoding ( FOR ) is considered, where
values are expressed as small differences from some frame-of-reference value,
such as the minimum, in a block of data. 18 RLE, where repeats of the same
element are expressed as (value, run-length) pairs, is presented as an attractive
approach for compressing sorted data in a column-wise store.
An important point made is that if one wants to exploit different com-
pression techniques depending on local properties of data, it is important
to find ways to avoid an associated increase in code complexity, where each
combination of compression types used to represent the arguments of a join
operation would otherwise require its own piece of code. 18 The authors give
several examples showing how, by using compressed blocks as an intermedi-
ate representation of data, operators can operate directly on compressed data
whenever possible, degenerating to a lazy decompression scheme when not
possible. Also, by abstracting general properties of compression techniques
and letting operators check these properties, operator code may be shielded
from having to know details of the way data are encoded.
Harizopoulos et al. 9 discuss techniques for performance improvements in
read-optimized databases. The authors report how they have studied per-
formance effects of compressing data using three commonly used lightweight
compression techniques: dictionary, bit packing, and FOR-delta , the latter a
variation of FOR where the value stored is the difference of a value from the
previous one, instead of from the same base value.
Database compression techniques are discussed from a more general per-
spective, pointing out that certain statistical properties of data in a DBMS,
in particular skew, correlation, and lack of tuple order, may be used to achieve
additional compression. 40 They present a new compression method based on
a mix of column and tuple coding, while employing Huffman coding, lexico-
graphical sorting, and delta coding. The paper provides a deeper performance
analysis of its methods than has been customary, making it an important con-
tribution also in this context, although it does not specifically focus on column-
wise storage. Finally, it briefly discusses how to perform certain operations,
Search WWH ::




Custom Search