Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

7.1.3 Architectural Opportunities

Today, there is a growing interest in what has been called read-optimized

database systems, 8 , 9 that is, systems that are oriented toward ad hoc query-

ing of large amounts of data that require little or no updating. Data ware-

houses represent one class of read-optimized systems, in which bulk loads of

new data are periodically carried out, followed by a relatively long period of

read-only querying. Early interest in this class of systems came from various

statistical and scientific applications, such as epidemiological, pharmacolog-

ical, and other data analytical studies in medicine, 10 as well as intelligence

analysis applications. 11 Transposed files , as vertical storage schemes were usu-

ally called at the time, were used in a number of early nonrelational read-

optimized database systems. A fairly comprehensive list of such systems was

given in Copeland and Khoshafian, 12 which asserted that the standard tabular

scheme for storage of relations is not necessarily the best, and that transposed

files can offer many advantages.

In the field of database technology during the 1970s and early 80s, there

was little consensus on how to perform experiments, or even on what to mea-

sure while performing them. Today's experiments and analyses are usually

far better planned and executed, and the accumulated scientific knowledge in

database technology is vastly greater. There is now very good evidence that

vertical database systems can offer substantial performance advantages, in

particular when used in those statistical and analytical kinds of applications

for which the concept was originally developed, (cf. Section 7.5).

An important conceptual step associated with the use of transposed files

is that a whole range of new architectural opportunities opens. Below is

a partial list of such architectural opportunities. We note, however, that

only a subset of these techniques has been widely used in systems currently

available:

column-wise storage of data, in place of the conventional row-wise data

storage layout used in most relational database management systems

(RDBMS), can eliminate unnecessary data access if only a subset of the

columns is involved in the query

clustering , in particular sort ordering, of attribute values, can speed up

searches over column data

various kinds of lightweight data compression: minimum byte size, dictio-

nary encoding, differencing of attribute value sequences, can reduce the

amount of data accessed from disk to memory

run-length encoding ( RLE ) data compression for columns that are ordered

can reduce the amount of data fetched from disk into main memory

dynamically optimized sequential combinations of different compression tech-

niques can reduce processing time

Search WWH ::

Custom Search

Home