Database Reference
In-Depth Information
7.5 Indexes for Data Warehouses
A major concern in database management systems (DBMSs) is to provide
fast access to data. Given a query, a relational DBMS attempts to choose the
best possible access path to the data. A popular way to speed data access is
known as indexing. An index provides a quick way to locate data of interest.
Almost all the queries asking for data that satisfy a certain condition are
answered with the help of some index.
As an example, consider the following SQL query:
SELECT *
FROM Employee
WHERE EmployeeKey = 1234
Without an index on attribute EmployeeKey , we should perform a complete
scan of table Employee (unless it is ordered), whereas with the help of an
index over such attribute, a single disk block access will do the job since this
attribute is a key for the relation.
Although indexing provides advantages for fast data access, it has a
drawback: almost every update on an indexed attribute also requires an index
update. This suggests that designers and database administrators should
be careful on defining indexes because their proliferation can lead to bad
updating performance.
The most popular indexing technique in relational databases is the B + -
tree. All major vendors provide support for some variation of B + -tree indexes.
AB + -tree index is a multilevel structure containing a root node and pointers
to the next lower level in a tree. The lowest level is formed by the leaves of
the tree, which in general contain a record identifier for the corresponding
data. Often, the size of each node equals the size of a block, and each node
holds a large number of keys, so the resulting tree has a low number of levels
and the retrieval of a record can be very fast. This works well if the attribute
being indexed is a key of the file or if the number of duplicate values is low.
We have seen that queries submitted to an OLAP system are of a very
different nature than those of an OLTP system. Therefore, new indexing
strategies are needed for OLAP systems. Some indexing requirements for a
data warehouse system are as follows:
￿ Symmetric partial match queries: Most OLAP queries involve partial
ranges. An example is the query “Total sales from January 2006 to
December 2010.” As queries can ask for ranges over any dimension, all
the dimensions of the data cube should be symmetrically indexed so that
they can be searched simultaneously.
￿ Indexing at multiple levels of aggregation: Since summary tables can be
large or queries may ask for particular values of aggregate data, summary
tables must be indexed in the same way as base nonaggregated tables.
Search WWH ::




Custom Search