Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

7.5.6 Use Cases Where Vertical Databases May

Not Be Appropriate

There are a number of situations where column-wise storage is comparable

to or slower than row-wise systems. The category of point-and-range queries

is usually eciently supported in the row-store databases since the available

indexes enable quick retrieval of qualifying rows. For a small number of qual-

ifying rows, the data transfer is su ciently e cient and is not perceivable

by the end user. For the same query category MonetDB often uses a sequen-

tial scan, which might be slower than searching with a B-tree index. How-

ever, for append-only data, which is the case for scientific data, new types

of compressed bitmap indexes (described in Chapter 6) require a relatively

small space overhead of only 30% of the original data. If this overhead is not

prohibitive, then all columns (or columns searched often) can be indexed to

provide ecient point-and-range queries in vertical databases.

Another source of performance overhead in vertical databases is tuple re-

construction joins. Despite their ecient implementation, they may still con-

tribute a substantial cost for queries that request all attributes (referred to

as “SELECT *” queries), or queries with a large number of attributes. Here

again, using compressed bitmap indexes can mitigate this overhead, since

joining the results of qualifying tuples from each column can be done by log-

ical operations (AND, OR, NOT) over multiple bitmaps, where each bitmap

represents the result of searching the index of each column.

There are some uncommon applications where all (or most) columns are

needed in every query. In such cases there is no value to using column-wise or-

ganization, and row-wise organization with appropriate indexing (for selecting

the desired tuples given predicate conditions) may prove more ecient. Also,

row-wise organization may be more appropriate in applications where very

few rows are selected, and several columns are involved. An extensive analysis

of which organization is best was conducted in. 60 Given a characterization

of the query patterns, a formula was developed in order to determine which

organization is better. By and large, for applications where a large number

of rows is selected, and only a subset of the columns is involved in the query,

column-wise organization is superior. Furthermore, in practical experiments

described in O'Neil et al., 60 it was shown that when sequential reads (which

are much faster than random read operations) are considered as a possible

strategy, column-wise organization is even more favorable because it is much

easier to utilize sequential read operations with the vertical data organization.

Although we prefer to reduce data redundancy, in some cases it may prove

useful to store derived data when generated, for instance, by expensive com-

putations. For example, the Neighbors table groups together pairs of SDSS

objects within an a-priori distance bound of 0.5 arc-minutes. Our attempt to

replace this table with a view computing the distances was shown to be less

ecient than accessing the precomputed table.

Scientific Data Management

Search WWH ::

Custom Search

Home