Database Reference
In-Depth Information
segments. To do this, C-Store has to join segments from different projections,
which is accomplished using storage keys and join indexes. Each segment
associates every data value of every column with a storage key SK. Values
from different columns in the same segment with matching storage keys belong
to the same row. Storage keys in RS are implicit, while storage keys in WS are
explicitly represented as integers, larger than the largest storage key in RS.
Assuming that T1 and T2 are projections that together cover the attributes
of a table T, an entry in the join index for a given tuple in a segment of T1
contains the segment ID and storage key of the joining tuple in T2. Since all
join indexes are between projections anchored at the same table, this is always
a one-to-one mapping. This strategy contributes to ecient join operations in
C-Store.
7.4 The Architecture and Evolution of MonetDB
7.4.1 Design Principles
MonetDB has been introduced in the previous section as a database system
that uses vertical decomposition in order to reduce disk I/O. However, the
principal motivation to use this storage model was not so much to reduce disk
I/O in scientific or business intelligence query loads—though that is certainly
one of its effects. Rather, the MonetDB architecture was based on other con-
siderations given in the original Decomposition Storage Model (DSM) 12 paper,
namely it focused on data storage layout and query algebra, with the purpose
of achieving higher CPU eciency.
When the original relational databases appeared, CPU hardware followed
an in-order, single-pipeline, one-at-a-time design, using a low clock frequency,
such that RAM latency took just a few cycles, and disk I/O was the strongest
performance factor in database performance. In modern multi-GHz comput-
ers, however, the cycle cost of a CPU instruction is highly variable and in
pipelined CPU designs with multiple instructions per clock, it depends on
CPU cache hit ratio, on branch misprediction ratio (see Ross 52 ), and on de-
pendencies on other instructions (less dependencies leading to faster execu-
tion). In other words, the difference in throughput between “good” and “bad”
program code has been increasing significantly with newer CPU designs, and
it had become clear that traditional database code turned out to fit mostly in
the “bad” basket. 28 Therefore, MonetDB attempted to follow a query execu-
tion strategy radically different from the prevalent tuple-at-a-time pull-based
iterator approach (where each operator gets its input by calling the opera-
tors of its children in the operator tree), as that can be linked to the “bad”
performance characteristics of database code. MonetDB aimed at mimicking
Search WWH ::




Custom Search