Emerging Database Systems in Support of Scientific Data - Scientific Data Management

Database Reference

In-Depth Information

all downstream boundary buffer nodes contain a value (usually a vector value).

An operator node may execute whenever none of its inbuffer nodes are empty,

and none of its outbuffer nodes are full. In the last system version of Cantor

(1991), 31 different vectorized stream operators were available to the dataflow

network generator. They are software analogues of the machine instructions of

a vectorized dataflow computer. On modern “multicore” computers as well as

on shared-nothing multinode computer systems, Cantor's vectorized dataflow

query evaluation process could quite easily be parallelized.

Boncz et al. 31 argue that database systems usually execute less than one

instruction per cycle (IPC), while in scientific computation, such as matrix

multiplication, or in multimedia processing, IPCs of two or more are not un-

common on modern CPUs. The authors claim that database systems do not

need to perform so badly relative to scientific computing workloads. Based on

experimental results they conclude that there are interpretation techniques

that, if exploited, would allow DBMS compute performance to approach that

of scientific computing workloads. A key technique by which this may be

achieved is loop pipelining , whereby interpretation overhead is distributed over

many elementary operations. This technique is central to the vectorized proto-

type query processor X100, recently designed and evaluated by the MonetDB

developers. According to Boncz et al., 31 its goal is to:

1. execute high-volume queries at high CPU e ciency,

2. be extensible to other application domains like data mining and multi-

media retrieval, and

3. scale with the size of the lowest storage hierarchy (disk).

To achieve these goals, X100 must manage bottlenecks throughout the com-

puter architecture:

Disk . The columnBM I/O subsystem of X100 is geared toward e cient se-

quential data access. To reduce bandwidth requirements, it uses a verti-

cal storage layout that in some cases is enhanced with lightweight data

compression.

RAM . Like I/O, RAM access is carried out through explicit memory-to-

cache routines, which contain platform-specific optimizations. The same

vertically partitioned and even compressed disk data layout is used in

RAM to save space and bandwidth.

Cache . A Volcano-like 33 execution pipeline with a vectorized processing

model is used. Small vertical chunks (e.g., 1,000 values) of cache-resident

data items, called vectors , are the unit of operation for X100 execution

primitives. The CPU cache is the only place where bandwidth does not

matter, and therefore (de)compression happens on the boundary be-

tween RAM and cache.

CPU . Vectorized primitives expose to the compiler that processing a tuple is

independent of the previous and next tuples. Vectorized primitives for

Search WWH ::

Custom Search

Home