Database Reference
In-Depth Information
manager called column buffer manager (ColumnBM) was developed. The
main difference with MonetDB is that the former stores each BAT in a
single contiguous file, while ColumnBM partitions those files in columns (or
chunks) and applies compression to optimize the usage of the CPU cache.
Compression and decompression are managed by the buffer manager. The
figure shows also the flow corresponding to each column, from disk until it is
scanned by the query processor and passed on to the query tree. Thus, instead
of single tuples, entire vectors of values flow upward in the tree. This is called
vectorized execution . As a consequence, materialization of intermediate
results as in MonetDB is not needed. Besides, the entire execution happens
within the CPU cache, since this is where the vectors scanned by the query
processor are taken from. As shown in Fig. 13.5 , main memory is only used as
an I/O buffer managed by ColumnBM. This is called in-cache processing .
As occurs with many systems, a problem with vertical storage is an
increased update cost: a single row update or delete must perform one I/O
for each column. MonetDB/X100 avoids this by considering the vertical
fragments as objects that do not change. For this, updates are applied to data
in so-called delta structures (i.e., structures that store new data). A delete
is handled by adding the tuple identifier to a deletion list and an insert as
an append in separate delta columns. ColumnBM stores all delta columns
together. Thus, both operations only imply one I/O operation. Updates are
treated simply as a deletion followed by an insertion. When the column size
exceeds a threshold, data storage must be reorganized, which consists in
making the vertical storage up to date and the delta columns empty.
13.5.4 SAP HANA
The SAP approach to business intelligence, known as HANA, 5 is based on
two main components:
1. The SAP HANA database (also called SAP IMDBS), a hybrid IMDBS
that combines row-based, column-based, and object-based technologies,
optimized for taking advantage of parallel processing.
2. The SAP HANA appliance (SAP HANA), used for analyzing large volumes
of data in real time without the need to materialize aggregations. It is a
combination of hardware and software delivered by SAP in cooperation
with hardware partners, like IBM.
The core of the SAP HANA database are two relational database engines.
The first one is a column-based engine , holding tables with large amounts of
data that can be aggregated in real time and used in analytical operations.
Thesecondoneisa row-based engine , optimized for row operations, such
5 http://www.saphana.com
Search WWH ::




Custom Search