Physical Data Warehouse Design - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

7.5 Indexes for Data Warehouses

A major concern in database management systems (DBMSs) is to provide

fast access to data. Given a query, a relational DBMS attempts to choose the

best possible access path to the data. A popular way to speed data access is

known as indexing. An index provides a quick way to locate data of interest.

Almost all the queries asking for data that satisfy a certain condition are

answered with the help of some index.

As an example, consider the following SQL query:

SELECT *

FROM Employee

WHERE EmployeeKey = 1234

Without an index on attribute EmployeeKey , we should perform a complete

scan of table Employee (unless it is ordered), whereas with the help of an

index over such attribute, a single disk block access will do the job since this

attribute is a key for the relation.

Although indexing provides advantages for fast data access, it has a

drawback: almost every update on an indexed attribute also requires an index

update. This suggests that designers and database administrators should

be careful on defining indexes because their proliferation can lead to bad

updating performance.

The most popular indexing technique in relational databases is the B + -

tree. All major vendors provide support for some variation of B + -tree indexes.

AB + -tree index is a multilevel structure containing a root node and pointers

to the next lower level in a tree. The lowest level is formed by the leaves of

the tree, which in general contain a record identifier for the corresponding

data. Often, the size of each node equals the size of a block, and each node

holds a large number of keys, so the resulting tree has a low number of levels

and the retrieval of a record can be very fast. This works well if the attribute

being indexed is a key of the file or if the number of duplicate values is low.

We have seen that queries submitted to an OLAP system are of a very

different nature than those of an OLTP system. Therefore, new indexing

strategies are needed for OLAP systems. Some indexing requirements for a

data warehouse system are as follows:

Symmetric partial match queries: Most OLAP queries involve partial

ranges. An example is the query “Total sales from January 2006 to

December 2010.” As queries can ask for ranges over any dimension, all

the dimensions of the data cube should be symmetrically indexed so that

they can be searched simultaneously.

Indexing at multiple levels of aggregation: Since summary tables can be

large or queries may ask for particular values of aggregate data, summary

tables must be indexed in the same way as base nonaggregated tables.

Search WWH ::

Custom Search

Home