Database Reference
In-Depth Information
MonetDB database software architecture does not feature secondary storage
structures or explicit I/O operations, whereas the underlying “physical” stor-
age architecture is multilevel hierarchical, consisting of CPU registers on the
lowest level, two levels of hardware cache memory (L1 and L2), main memory,
and virtual swap memory on disk.
A conclusion of these studies is that database algorithms and data struc-
tures should be designed and optimized for e cient multilevel memory access
from the outset. Careless implementation of the key algorithms can lead to a
performance disaster that even faster CPUs will not be able to rescue, whereas
careful design can lead to an order of magnitude performance improvement.
The authors claim, on very good analytical and experimental grounds, that
the vertical decomposition storage feature is in fact the basis of achieving such
high performance.
One final achievement of these design studies and subsequent implementa-
tion improvements is a data mining benchmark result, which is two orders of
magnitude better than that of some commercial database products.
7.2.3 Vectorization and the Data-Flow Execution Model
The use of vector operators (vectorization) in interpretive query evaluation
aims at distributing the (usually heavy) interpretation overhead over many
elementary CPU operations. It is the same basic idea that is exploited in a vec-
torized CPU, although used in this context primarily to improve the eciency
of a software process. It turns out, however, that it is not straightforward to
devise an ecient vectorized query evaluation process. This is a topic dis-
cussed at some length by Boncz et al. 31 It would seem that the MonetDB
developers are more or less alone in taking advantage of this approach today.
In Karasalo and Svensson, 5 a vectorized interpretation technique for query
evaluation was presented, claiming to be analogous to the operation of a vec-
torized dataflow computer. 32 This architectural approach was chosen to make
sense of the transposed file architecture when extended to support general re-
lational queries from the basic access patterns and conjunctive queries previ-
ously studied 3 , 16 and briefly discussed in Section 7.2.6. It has in fact several key
features in common with those of MonetDB/X100 described in Section 7.4.4.
Karasalo and Svensson 5 survey the methods that were developed for trans-
lating the parsed and syntactically optimized expression of a relational query
into an execution plan in the form of one or more hierarchies of static dataflow
networks. Each network in the execution plan is a bipartite graph , that is, if in
such a network two nodes are connected by an arc, then one is a data buffer
node and the other an operator node.
Network generation is followed by an execution phase, which proceeds in
two stages: (1) when initializing a network hierarchy for evaluation, space is
assigned to its buffers from a buffer pool common to all networks; and (2) when
evaluating a network, initially all but the upstream boundary buffer nodes are
empty. Evaluation proceeds by executing operator nodes in some order until
Search WWH ::




Custom Search