New Data Warehouse Technologies - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

called caching . Note, however, that caching only speeds up database reads,

while updates or writes must still be written through the cache to disk.

Therefore, the performance benefit only applies to a subset of database tasks.

In addition, managing the cache is itself a process that requires substantial

memory and CPU resources. An IMDBS reduces to a minimum these data

transfers, since data are mainly in memory. It follows that the optimization

objectives of disk-based database systems are opposed to those of an IMDBS.

Traditional DBMSs try to minimize input/output (I/O) using the cache,

consuming CPU cycles to maintain this cache. In addition, as we have seen,

they keep redundant data, for example, in index structures, to enable direct

access to records without the need to go down to the actual data. On the

contrary, an IMDBS is designed with the optimization goal of reducing both

memory consumption and CPU cycles.

Like traditional DBMSs, typical IMDBSs support the ACID properties ,

namely, atomicity, consistency, isolation, and durability. The first three ones

are supported as in traditional DBMSs. Since the main memory is volatile,

durability is supported by transaction logging, in which snapshots of the

database are called periodically at certain time instants (called savepoints or

checkpoints , depending on the technology and the vendor) and are written

to nonvolatile media. If the system fails and must be restarted, the database

either rolls back to the last completed transaction or rolls forward to complete

any transaction that was in progress when the system failed. IMDBSs also

support durability by maintaining one or more copies of the database, which,

as in traditional systems, is called replication . Nonvolatile RAM provides

another means of in-memory database persistence.

Finally, disk-based storage can be applied selectively in an IMDBS. For

example, certain record types can be written to disk, while others are

managed entirely in memory. Functions specific for disk-based databases, such

as cache management, are applied only to records stored on disk, minimizing

the impact of these activities over performance and CPU demands.

Figure 13.4 depicts the typical data storage architecture of an IMDBS. 1

The database is stored in main memory, and it is composed of three main

parts. The main store contains data stored in a column-oriented fashion.

For query optimization reasons, some products also store together groups

of columns that are usually accessed together. These are called combined

columns. The buffer store is a write-optimized data structure that holds

data that have not yet been moved to the main store. That means that a

query can need data from both the main store and the buffer. The special

data structure of the buffer normally requires more space per record than

the main store. Thus, data are periodically moved from the buffer to the

main store, a process that requires a merge operation. There are also data

structures used to support special features. Examples are inverted indexes

1 This figure is inspired by the SAP HANA architecture (described later in the

chapter), although most IMDBSs follow a similar architecture.

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home