Database Reference
In-Depth Information
dataset fits entirely in memory, there were no scalability issues with the main-
memory orientation of MonetDB. The large vendor-specific schema (consisting
of 91 tables, 51 views, and 203 functions, of which 42 are table-valued) and
its extensive use of the SQL persistent storage module functionality required
an engineering effort. We had to cast vendor-specific syntax (such as identi-
fiers “datetime” vs. “timestamp”) in the schema definition into the SQL:2003
standard supported by MonetDB/SQL. We also adapted the application to
the column-store architecture and slightly modified the schema, reducing data
redundancy.
The challenge addressed in the second phase was to scale the application to
sizes beyond the main memory limit. The target dataset was a 10% subset of
approximately 150 GB. At the time of writing, the project is in its third phase
aiming to support the full-size 4 TB database. Some interesting techniques
yet to be investigated that may increase system eciency are exploring par-
allel load, interleaving of column I/O with query processing, self-organizing
indexing schemes, and exploitation of commonalities in query batches.
7.5.2 Ecient Vertical Data Access for Disk-Bound Queries
As explained in the introductory section, the major advantage of column-wise
storage comes from minimizing the data flow from disk through memory into
the CPU caches. Many scientific analytical applications involve examination
of an entire table, or a big portion of it, while at the same time spanning just
a few attributes at a time. The immediate benefit the column-wise storage
brings is that only data columns relevant for processing are fetched from disk.
In contrast, the access pattern in a row-wise storage of wide tables, such as
the PhotoObjAll table, might require hundreds of columns to be transferred
from disk, where many of the columns are irrelevant to the query. This be-
comes the major performance bottleneck for analytical queries. To illustrate
the problem, consider the following SQL query searching for moving asteroids
(Q15 in Gray et al. 58 ).
SELECT objID, sqrt(power(rowv,2) + power(colv,2)) as
velocity
FROM PhotoObj
WHERE (power(rowv,2) + power(colv,2)) > 50
and rowv > = 0 and colv > =0;
The execution plan for a row-wise storage organization involves a full table
scan, which leads to transferring entire records of 440+ columns in order to
process the four columns referred to in the query. For the 150 GB dataset, the
volume transferred is almost 50 GB. The execution plan in MonetDB involves
scans strictly limited to the columns directly referenced in the query, which
amounts to 370 MB for the example query above.
The access pattern problem in row-wise storage systems has already been
addressed by a variety of techniques, such as indexes, materialized views, and
Search WWH ::




Custom Search