Database Reference
In-Depth Information
The use of a vertical data layout not only on disk but also throughout
query processing turned out to be beneficial, especially when operators ac-
cess data sequentially. Random data access, even if data fits into RAM, is
dicult to make ecient, especially if the accessed region does not fit into
the CPU cache. In fact, random access does not exploit all the RAM band-
width optimally; this is typically only achieved if the CPU detects a sequential
pattern and the hardware prefetcher is activated. Therefore main-memory al-
gorithms that have predominantly sequential access tend to outpace random-
access algorithms, even if they do more CPU work. Sequential algorithms, in
turn, strongly favor vertical storage, as memory accesses are dense regardless
of whether a query touches all table columns. Also, sequentially processing
densely packed data allows compilers to generate single instruction, multiple
data (SIMD) code, which further accelerates processing on modern machines.
Finally, the idea articulated in the DSM paper, 12 that DSM could be the
physical data model building block that can power many more complex user-
level data models, was validated in the case of MonetDB, where a number
of diverse front ends were built. We describe briefly below the way BATs
were used for processing of different front-end data models and their query
languages.
SQL. The relational front-end decomposes tables by column, in BATs with a
dense (nonstored) TID head, and a tail column with values. For each ta-
ble, a BAT with deleted positions is kept. For each column, an additional
BAT with insert value is kept. These delta BATs are designed to delay
updates to the main columns and allow a relatively cheap snapshot iso-
lation mechanism (only the delta BATs are copied). MonetDB/SQL also
keeps additional BATs for join indexes, and value indexes are created
on-the-fly.
XQuery. The work in the Pathfinder project 53
makes it possible to store
coordinates, rep-
resented in MonetDB as a collection of BATs. In fact, the pre-numbers
are densely ascending, hence can be represented as a (nonstored) dense
TID column, saving storage space and allowing fast O(1) lookups. Only
slight extensions to the BAT Algebra were needed, in particular a series
of region-joins called “staircase joins” was added to the system for the
purpose of accelerating XPath predicates. MonetDB/XQuery provides
comprehensive support for the XQuery language, the XQuery Update
facility, and a host of specific extensions.
Arrays. The Sparse Relational Array Mapping (SRAM) project maps large
(scientific) array-based datasets into MonetDB BATs, and offers a
high-level, comprehension-based query language. 54 This language is
subsequently optimized on various levels before being translated into
BAT Algebra. Array front ends are particularly useful in scientific ap-
plications.
XML tree structures in relational tables as
<
pre,post
>
Search WWH ::




Custom Search