Database Reference
In-Depth Information
projections (expression calculation) do this easily, but Boncz et al. 31 try
to achieve the same for other query processing operators as well (e.g.,
aggregation). This allows compilers to produce ecient loop-pipelined
code.
7.2.4 Data Compression
There are two obvious ways a DBMS can trade CPU cycles to save disk
space and thereby I/O bandwidth, a precious resource. 17 In this context, we
are concerned with I/O bandwidth only, since disk space itself has recently
become so cheap that its cost rarely matters at all except perhaps in extreme
applications, such as large-scale Web search. 34 First, data may be coded into
a more compact form. 8 For example, if one is storing an attribute that is
a U.S. customer's state of residence, the state can be coded directly into 6
bits, whereas the standard two-character abbreviation requires 16 bits and a
variable-length character string for the full name of the state requires many
more. Second, data values may be packed compactly by storing N values in
K*N bits, where K is the smallest byte size that can hold any value in the
column ( bit packing ). Of course, more sophisticated schemes may be used,
which can save even more space while allowing for flexible updates with data
items which require more than K bits. It is also possible to use additional
techniques, in particular sort order, to save additional I/O bandwidth. Note
that these simple data compression techniques are equally applicable to row-
wise as to column-wise storage schemes. Therefore, when a query is processed
using a column-wise storage scheme, what makes the basic difference with
respect to I/O bandwidth is the fact that there is no need to transfer data
from irrelevant columns into main memory. As we will see, however, additional,
more sophisticated compression techniques may also be exploited for column-
wise storage.
Although Abadi et al. 18 state that “it was not until the 90s when researchers
began to concentrate on how compression affects database performance,” the
two early papers 16 , 35 both emphasize in different ways that fast data access
may be achieved through judicious use of data compression in fully trans-
posed ( a
vertically partitioned) files. Early tutorials on the subject in-
clude Severance, 36 Bassiouni, 37 and Roth and Van Horn; 38 however, none of
them specifically discusses how to achieve query evaluation-speed improve-
ments. Recent papers on the use of data compression in relational databases
are Westmann et al., 39 Abadi et al., 18 and Raman and Swart. 40 The most sig-
nificant advantages are typically obtained when combining data compression
with fully transposed file or DSM storage, but there are also proposals to use
compression in row-oriented storage schemes.
Abadi et al. 18 discuss an extension of the column-store storage and access
subsystem of the C-Store system, 8 while also addressing the issue of querying
compressed data. The extended column storage exploits the fact that “sorted
data is usually quite compressible” 18 (p. 671), and suggests storing columns in
.
k
.
a
.
Search WWH ::




Custom Search