Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

appear in the join expression. The result of this phase is another collection

of temporary index lists indicating which tuples in each conceptual relation

satisfy the query. Since a join index clustered on the desired TID exists for all

entity-based equi-joins, a full scan can always be avoided. During the value

materialization phase several independent joins are evaluated, preferably in

parallel. The join operands are small binary relations containing only TIDs.

The final composition phase executes an m -way merge join, which permits a

large degree of parallelism. Its operands are all small binary relations contain-

ing only TID lists whose cardinality has been maximally reduced due to the

select operations.

The practical conclusions from this work, reported in Valduriez et al. 25 and

cited in Khoshafian et al., 23 are (1) that DSM with join indexes provides better

retrieval performance than NSM when the number of retrieved attributes is

low or the number of retrieved records is medium to high, but NSM provides

better retrieval performance when the number of retrieved attributes is high

and the number of retrieved records is low; and (2) that the performance of

single attribute modification is the same for both DSM and NSM, but NSM

provides better record insert/delete performance.

This approach is similar to those used in MonetDB 17 , 26 and in Cantor, 27

with the following main differences: (1) DSM provides two predefined join in-

dexes for each attribute, one clustered on each of the two attributes (attribute

value, TID), while Cantor and MonetDB both use indexes that are created

as needed during query evaluation; (2) Cantor stores these indices using RLE

compression; MonetDB introduces a novel radix cluster algorithm for hash

join; (3) although potentially important, parallelism has not been presented

as a key design issue for MonetDB, nor was it one for Cantor; (4) the algo-

rithms used in MonetDB and Cantor were both presented as simple two-way

joins, corresponding mainly to the composition phase in the DSM algorithm,

which is presented as an m-way join.

7.2.2 The Impact of Modern Processor Architectures

Research has shown that DBMS performance may be strongly affected by

“cache misses” 28 and can be much improved by use of cache-conscious data

structures, including column-wise storage layouts such as DSM and within-

page vertical partitioning techniques. 29 In Ailamaki et al. 28 (p. 266) this ob-

servation is summarized as follows: “Due to the sophisticated techniques used

for hiding I/O latency and the complexity of modern database applications,

DBMSs are becoming compute and memory bound.” In Boncz et al., 30 it is

noted that past research on main-memory databases has shown that main-

memory execution needs different optimization criteria than those used in

I/O-dominated systems.

On the other hand, it was a common goal early on for scientific database

management systems (SDBMS) development projects to exploit the superior

CPU power of computers used for scientific applications, which in the early

Scientific Data Management

Search WWH ::

Custom Search

Home