Database Reference
In-Depth Information
7.5.6 Use Cases Where Vertical Databases May
Not Be Appropriate
There are a number of situations where column-wise storage is comparable
to or slower than row-wise systems. The category of point-and-range queries
is usually eciently supported in the row-store databases since the available
indexes enable quick retrieval of qualifying rows. For a small number of qual-
ifying rows, the data transfer is su ciently e cient and is not perceivable
by the end user. For the same query category MonetDB often uses a sequen-
tial scan, which might be slower than searching with a B-tree index. How-
ever, for append-only data, which is the case for scientific data, new types
of compressed bitmap indexes (described in Chapter 6) require a relatively
small space overhead of only 30% of the original data. If this overhead is not
prohibitive, then all columns (or columns searched often) can be indexed to
provide ecient point-and-range queries in vertical databases.
Another source of performance overhead in vertical databases is tuple re-
construction joins. Despite their ecient implementation, they may still con-
tribute a substantial cost for queries that request all attributes (referred to
as “SELECT *” queries), or queries with a large number of attributes. Here
again, using compressed bitmap indexes can mitigate this overhead, since
joining the results of qualifying tuples from each column can be done by log-
ical operations (AND, OR, NOT) over multiple bitmaps, where each bitmap
represents the result of searching the index of each column.
There are some uncommon applications where all (or most) columns are
needed in every query. In such cases there is no value to using column-wise or-
ganization, and row-wise organization with appropriate indexing (for selecting
the desired tuples given predicate conditions) may prove more ecient. Also,
row-wise organization may be more appropriate in applications where very
few rows are selected, and several columns are involved. An extensive analysis
of which organization is best was conducted in. 60 Given a characterization
of the query patterns, a formula was developed in order to determine which
organization is better. By and large, for applications where a large number
of rows is selected, and only a subset of the columns is involved in the query,
column-wise organization is superior. Furthermore, in practical experiments
described in O'Neil et al., 60 it was shown that when sequential reads (which
are much faster than random read operations) are considered as a possible
strategy, column-wise organization is even more favorable because it is much
easier to utilize sequential read operations with the vertical data organization.
Although we prefer to reduce data redundancy, in some cases it may prove
useful to store derived data when generated, for instance, by expensive com-
putations. For example, the Neighbors table groups together pairs of SDSS
objects within an a-priori distance bound of 0.5 arc-minutes. Our attempt to
replace this table with a view computing the distances was shown to be less
ecient than accessing the precomputed table.
Search WWH ::




Custom Search