Introduction - Relational Database Index Design and the Optimizers

Database Reference

In-Depth Information

is much higher than it was 20 years ago. A reasonable request for a new index

should not be rejected intuitively. With current disks, an indexed volatile column

may become an issue only if the column is updated perhaps more than 10 times

a second ; such columns are not very common.

SYSTEMATIC INDEX DESIGN

The first attempts toward an index design method originate from the 1960s. At

that time, textbooks recommended a matrix for predicting how often each field

(column) is read and updated and how often the records (rows) containing these

fields are inserted and deleted. This led to a list of columns to be indexed. The

indexes were generally assumed to have only a single column, and the objective

was to minimize the number of disk input/outputs (I/Os) during peak time. It is

amazing that this approach is still being mentioned in recent topics, although a

few, somewhat more realistic writers, do admit that the matrix should only cover

the most common transactions.

This column activity matrix approach may explain the column-oriented think-

ing that can be found even in recent textbooks and database courses, such as

consider indexing columns with these properties and avoid indexing columns with

those properties .

In the 1980s, the column-oriented approach began to lose ground to a

response-oriented approach . Enlightened DBAs started to realize that the objec-

tive of indexing should be to make all database calls fast enough, given the

hardware capacity constraints. The pseudo-relational DBMS of IBM S/38 (later

AS/400, then the iSeries) was the vanguard of this attitude. It automatically built

a good index for each database call. This worked well with simple applications.

Today, many products propose indexes for each SQL call, but indexes are not

created automatically, apart from primary key indexes and, sometimes, foreign

key indexes.

As applications became more complex and databases much larger, the impor-

tance and complexity of index design became obvious. Ambitious projects were

undertaken to develop tools for automating the design process. The basic idea

was to collect a sample of production workload and then generate a set of index

candidates for the SELECT statements in the workload. Simple evaluation for-

mulas or a cost-based optimizer would then be used to decide which indexes

were the most valuable. This sort of product has become available over the last

few years but has spread rather slower than expected. Possible reasons for this

are discussed in Chapter 16.

Systematic index design consists of two processes as shown in Figure 1.2.

First, it is necessary to find the SELECTs that are, or will be, too slow with the

current indexes, at least with the worst input; for example, “the largest customer”

or “the oldest date”. Second, indexes have to be designed to make the slow

SELECTs fast enough without making other SQL calls noticeably slower. Neither

of these tasks is trivial.

Search WWH ::

Custom Search

Home