New Data Warehouse Technologies - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

manager called column buffer manager (ColumnBM) was developed. The

main difference with MonetDB is that the former stores each BAT in a

single contiguous file, while ColumnBM partitions those files in columns (or

chunks) and applies compression to optimize the usage of the CPU cache.

Compression and decompression are managed by the buffer manager. The

figure shows also the flow corresponding to each column, from disk until it is

scanned by the query processor and passed on to the query tree. Thus, instead

of single tuples, entire vectors of values flow upward in the tree. This is called

vectorized execution . As a consequence, materialization of intermediate

results as in MonetDB is not needed. Besides, the entire execution happens

within the CPU cache, since this is where the vectors scanned by the query

processor are taken from. As shown in Fig. 13.5 , main memory is only used as

an I/O buffer managed by ColumnBM. This is called in-cache processing .

As occurs with many systems, a problem with vertical storage is an

increased update cost: a single row update or delete must perform one I/O

for each column. MonetDB/X100 avoids this by considering the vertical

fragments as objects that do not change. For this, updates are applied to data

in so-called delta structures (i.e., structures that store new data). A delete

is handled by adding the tuple identifier to a deletion list and an insert as

an append in separate delta columns. ColumnBM stores all delta columns

together. Thus, both operations only imply one I/O operation. Updates are

treated simply as a deletion followed by an insertion. When the column size

exceeds a threshold, data storage must be reorganized, which consists in

making the vertical storage up to date and the delta columns empty.

13.5.4 SAP HANA

The SAP approach to business intelligence, known as HANA, 5 is based on

two main components:

1. The SAP HANA database (also called SAP IMDBS), a hybrid IMDBS

that combines row-based, column-based, and object-based technologies,

optimized for taking advantage of parallel processing.

2. The SAP HANA appliance (SAP HANA), used for analyzing large volumes

of data in real time without the need to materialize aggregations. It is a

combination of hardware and software delivered by SAP in cooperation

with hardware partners, like IBM.

The core of the SAP HANA database are two relational database engines.

The first one is a column-based engine , holding tables with large amounts of

data that can be aggregated in real time and used in analytical operations.

Thesecondoneisa row-based engine , optimized for row operations, such

5 http://www.saphana.com

Search WWH ::

Custom Search

Home