Database Reference
In-Depth Information
7.1.3 Architectural Opportunities
Today, there is a growing interest in what has been called read-optimized
database systems, 8 , 9 that is, systems that are oriented toward ad hoc query-
ing of large amounts of data that require little or no updating. Data ware-
houses represent one class of read-optimized systems, in which bulk loads of
new data are periodically carried out, followed by a relatively long period of
read-only querying. Early interest in this class of systems came from various
statistical and scientific applications, such as epidemiological, pharmacolog-
ical, and other data analytical studies in medicine, 10 as well as intelligence
analysis applications. 11 Transposed files , as vertical storage schemes were usu-
ally called at the time, were used in a number of early nonrelational read-
optimized database systems. A fairly comprehensive list of such systems was
given in Copeland and Khoshafian, 12 which asserted that the standard tabular
scheme for storage of relations is not necessarily the best, and that transposed
files can offer many advantages.
In the field of database technology during the 1970s and early 80s, there
was little consensus on how to perform experiments, or even on what to mea-
sure while performing them. Today's experiments and analyses are usually
far better planned and executed, and the accumulated scientific knowledge in
database technology is vastly greater. There is now very good evidence that
vertical database systems can offer substantial performance advantages, in
particular when used in those statistical and analytical kinds of applications
for which the concept was originally developed, (cf. Section 7.5).
An important conceptual step associated with the use of transposed files
is that a whole range of new architectural opportunities opens. Below is
a partial list of such architectural opportunities. We note, however, that
only a subset of these techniques has been widely used in systems currently
available:
column-wise storage of data, in place of the conventional row-wise data
storage layout used in most relational database management systems
(RDBMS), can eliminate unnecessary data access if only a subset of the
columns is involved in the query
clustering , in particular sort ordering, of attribute values, can speed up
searches over column data
various kinds of lightweight data compression: minimum byte size, dictio-
nary encoding, differencing of attribute value sequences, can reduce the
amount of data accessed from disk to memory
run-length encoding ( RLE ) data compression for columns that are ordered
can reduce the amount of data fetched from disk into main memory
dynamically optimized sequential combinations of different compression tech-
niques can reduce processing time
Search WWH ::




Custom Search