Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

70s could be orders of magnitude higher than those of processors designed

for commercial workloads. The main purpose was to make query evaluation

compute and memory bound rather than I/O bound whenever possible.

The MonetDB developers 17 , 26 , 31 have conducted thorough analyses of the

effect of modern computer hardware architectures on database performance.

As advances in CPU speed far outpace advances in dynamic random access

memory (DRAM) latency, the effect of optimal use of the memory caches is

becoming ever more important. In Manegold et al. 17 a detailed discussion is

presented of the impact of modern computer architectures, in particular with

respect to their use of multilevel cache memories to alleviate the continually

widening gap between DRAM and CPU speeds that has been a characteris-

tic for computer hardware evolution since the late 70s. Memory access speed

has stayed almost constant (within a factor of 2), while CPU speed has in-

creased by almost a factor of 1,000 from 1979 to 1999. Cache memories, which

have been introduced on several levels to reduce memory latency, can do so

effectively only when the requested data are found in the cache.

Manegold et al. 17 claim that it is no longer appropriate to think of the

main memory of a computer system as “random access” memory, and show

that accessing data sequentially also in main memory may provide significant

performance advantages. They furthermore show that, unless special care is

taken, a database server running even a simple sequential scan on a table may

spend 95% of its cycles waiting for memory to be accessed. This memory-

access bottleneck is even more dicult to avoid in more complex database

operations such as sorting, aggregation, and join, which exhibit a random

access pattern. The performance advantages of exploiting sequential data ac-

cess patterns during query processing have thus become progressively more

significant as faster processor hardware has become available.

Based on results from a detailed analytical cost model, Manegold et al. 17

discuss the consequences of this bottleneck for data structures and algorithms

to be used in database systems and identify vertical fragmentation as the

storage layout that leads to optimal memory cache usage.

A key tool whose utilization the MonetDB developers pioneered in database

performance research is the use of detailed access cost models based on input

from hardware event counters that are available in modern CPUs. Use of such

models has enabled them, among other things, to identify a significant bottle-

neck in the implementation of the partitioned hash-join and hence to improve

it using perfect hashing . Another contribution is their creation of a calibra-

tion tool , which allows relevant performance characteristics (cache sizes, cache

line sizes, cache miss latencies) of the cache memory system to be extracted

from the operating system for use in cost models, in order to predict the per-

formance of, and to automatically tune, memory-conscious query processing

algorithms on any standard processor.

It is the experience of the MonetDB developers that virtual-memory ad-

vice on modern operating systems can be effectively utilized in a way that

makes a single-level storage software architecture approach feasible. Thus, the

Scientific Data Management

Search WWH ::

Custom Search

Home