Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

With respect to query-processing algorithms, the MonetDB developers

have shown that a novel radix-cluster algorithm for hash-join is better than

standard bucket-chained alternatives. In a radix-cluster algorithm, both rela-

tions are first partitioned on hash number into a number of separate clusters

which each fit the memory cache, before appropriately selected pairs of clus-

ters are hash-joined together (see Manegold et al. 17 , 26 for details). The result

of a hash-join is a binary association table that contains the (TID1, TID2)

combinations of matching tuples, that is, a join index. As indicated above,

subsequent tuple reconstruction is a cheap operation that does not need to be

included in the analysis.

The architectural design and key features of the MonetDB system are pre-

sented in Section 7.4, and experience from using it in a large-scale scientific

database application is presented in Section 7.5.

7.3.2 C-Store

C-Store 8 , 50 features a two-level store with one writable part and one, typically

much larger, read-only part. Both storage levels are column oriented. The use

of this principle, called a differential file , in data management was studied in

Severance and Lohman, 51 although its full realization in a relational database

may perhaps have been first achieved in C-Store much later. This way, C-Store

attempts to resolve the conflicting requirements for a fast and safe parallel-

writable store (WS) and a powerful read-only query processor and storage

system (RS). Tuples are periodically moved from WS to RS by a batch update

process. Although the storage model of C-Store is more complex than the DSM

in order to also manage queries of simple-transaction type eciently, most of

its basic design principles are similar to those of DSM and hence best suited

for read-optimized databases.

To store data, C-Store implements projections . A C-Store projection is an-

chored on a given relation table, and contains one or more attributes from this

table, retaining any duplicate rows. In addition, a projection can contain any

number of attributes from other tables as long as there is a complete sequence

of foreign key relationships from the anchor table to the table from which an

attribute is obtained. Hence, a projection has the same number of rows as its

anchor table. The attributes in a projection are stored column-wise, using one

separate storage structure per attribute. Tuples in a projection are sorted with

respect to the same sort key, that is, any attribute or sequence of distinct at-

tributes of the projection. Finally, every projection is horizontally partitioned

into one or more segments , each of which is given a segment identifier value

sid , based on the sort key of the projection. Hence, each segment of a given

projection is associated with a key range of the sort key for the projection.

To answer SQL queries in C-Store, there has to exist at least one covering

set of projections for every table in the logical schema. In addition, it must be

possible to reconstruct complete rows of all tables from the collection of stored

Search WWH ::

Custom Search

Home