Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

as possible, given the above-mentioned constraints. The algorithm solved this

problem in time O(n).

For reading and writing data in b-lists, six access procedures are avail-

able, allowing sequential as well as direct-addressed read and write access of

data segments. One of the sequential read procedures does not unpack run-

compressed data sequences, enabling fast access to such data. This facility is

used by the conjunctive query search algorithm as well as by the merge-join

algorithm.

7.2.5 Buffering Techniques for Accessing Metadata

In vertical databases, metadata usually play an important role, not only to

support users with information about database contents, but also internally to

support parameter and algorithm selection during runtime. In order to exploit

the advantages of vertical segmentation to achieve the best possible runtime

performance, algorithms for searching and accessing data need to frequently

consult the metadatabase (MDB) for information about the current status of

database contents. For example, to find the fastest way to search a run-length

compressed, fully transposed, ordered file (CFTOF search, see Section 7.2.6),

it is important to be able to quickly access at runtime certain properties of

the attributes involved in the query, in particular their sort key position, as

well as their cardinality and value range. The latter information is needed to

obtain good estimates of the selectivity of search clauses, to be used when

determining which attribute access sequence the query should choose.

However, the vertical storage structure is not well suited for the manage-

ment of metadata, since the access pattern of system-generated, runtime meta-

data lookup queries and updates is much more “horizontally” clustered than

typical user queries. In fact, the access pattern of such queries is quite similar

to that of a random sequence of simple-transaction queries being issued during

a short time interval. To a first approximation the individual items in such a

sequence can be assumed to be uncorrelated, that is, when the system brings

into memory a vertically structured MDB buffer load to access one data item

in an MDB attribute, the likelihood that it will access another data item from

the same buffer load in the near future is no greater than that of a random

hit into the attribute. Therefore, the next time a data item is needed from

the same MDB attribute, it is quite likely that a new buffer load will have to

be fetched from disk. On the other hand, the likelihood is quite high that the

next few MDB accesses will involve other properties associated with the same

MDB relation, which should favor horizontal clustering. In the first case, the

hit rate for the buffered data will be low and the performance of the system

will suffer (unless the entire column fits into the buffer and can therefore be

assumed to stay in memory permanently).

A simple solution to this “impedance mismatch” problem could be to

store the metadata as a collection of hash tables or linked lists; but if one

wants the system to be able to answer general queries that involve metadata

Scientific Data Management

Search WWH ::

Custom Search

Home