Database Reference
In-Depth Information
all downstream boundary buffer nodes contain a value (usually a vector value).
An operator node may execute whenever none of its inbuffer nodes are empty,
and none of its outbuffer nodes are full. In the last system version of Cantor
(1991), 31 different vectorized stream operators were available to the dataflow
network generator. They are software analogues of the machine instructions of
a vectorized dataflow computer. On modern “multicore” computers as well as
on shared-nothing multinode computer systems, Cantor's vectorized dataflow
query evaluation process could quite easily be parallelized.
Boncz et al. 31 argue that database systems usually execute less than one
instruction per cycle (IPC), while in scientific computation, such as matrix
multiplication, or in multimedia processing, IPCs of two or more are not un-
common on modern CPUs. The authors claim that database systems do not
need to perform so badly relative to scientific computing workloads. Based on
experimental results they conclude that there are interpretation techniques
that, if exploited, would allow DBMS compute performance to approach that
of scientific computing workloads. A key technique by which this may be
achieved is loop pipelining , whereby interpretation overhead is distributed over
many elementary operations. This technique is central to the vectorized proto-
type query processor X100, recently designed and evaluated by the MonetDB
developers. According to Boncz et al., 31 its goal is to:
1. execute high-volume queries at high CPU e ciency,
2. be extensible to other application domains like data mining and multi-
media retrieval, and
3. scale with the size of the lowest storage hierarchy (disk).
To achieve these goals, X100 must manage bottlenecks throughout the com-
puter architecture:
Disk . The columnBM I/O subsystem of X100 is geared toward e cient se-
quential data access. To reduce bandwidth requirements, it uses a verti-
cal storage layout that in some cases is enhanced with lightweight data
compression.
RAM . Like I/O, RAM access is carried out through explicit memory-to-
cache routines, which contain platform-specific optimizations. The same
vertically partitioned and even compressed disk data layout is used in
RAM to save space and bandwidth.
Cache . A Volcano-like 33 execution pipeline with a vectorized processing
model is used. Small vertical chunks (e.g., 1,000 values) of cache-resident
data items, called vectors , are the unit of operation for X100 execution
primitives. The CPU cache is the only place where bandwidth does not
matter, and therefore (de)compression happens on the boundary be-
tween RAM and cache.
CPU . Vectorized primitives expose to the compiler that processing a tuple is
independent of the previous and next tuples. Vectorized primitives for
Search WWH ::




Custom Search