Database Reference
In-Depth Information
workloads. It is argued in that paper that the consequences of adopting the
primary design criterion for transactional, write-oriented databases “minimize
the portion of stored data that must be locked for exclusive access and the
length of time that locks are held.” Thus, according to Reference 1, these
consequences have generally led to the following set of design rules for trans-
actional, write-oriented databases:
Since data are usually accessed and modified one record at a time, data
should be stored row-wise to allow each record to be updated by a sin-
gle write operation. Also, data should be stored in small disk pages to
minimize the amount of data transferred between memory and disk and
to minimize the part of the disk file that needs to be locked during a
transaction.
Indexes should be restricted to a few attributes to avoid locking the en-
tire index tree structures on disk and thereby denying access to whole
sets of rows, which might otherwise become necessary when indexes are
updated.
Compression of data is usually not profitable because there is often a mix
of different data types and unrelated data values in each row. The CPU
time required for compression and decompression will therefore not be
recovered by reduced data transfer volume.
Adding or deleting attributes and indexes is likely to be expensive since all or
a large part of the data pages used by the parent table may be affected.
Finally, updates of an attribute according to even a simple predicate are
likely to be costly because the entire row must be read and written
when a single attribute is to be updated.
Once the primary criterion for the internal design of a database system
becomes the achievement of high performance of complex analytics tasks, this
set of rules should be changed as follows:
Since by storing data column-wise instead of row-wise it is possible to avoid
touching those disk pages of a table that are not at all affected by a
query, considerable performance improvements may be achieved. Cache
eciency will be enhanced because commonly accessed columns will
tend to stay in the cache.
Data are likely to be read many more times than they are written or updated,
making CPU time “investment” in the creation of ecient storage struc-
tures more likely to be profitable. Also, data should be stored in large
pages so that a large number of relevant data items can be retrieved in
a single read operation, resulting in a high overall “hit ratio.” Row-wise
storage, on the other hand, tends to disfavor large page sizes since each
read operation also drags into memory attributes that are not relevant
to the query in question, resulting in a low overall hit ratio.
Search WWH ::




Custom Search