Optimizers Are Not Perfect - Relational Database Index Design and the Optimizers

Database Reference

In-Depth Information

indexes without making any measurements; the current optimizers do not issue

any SELECT calls to the database when choosing the access path. In a limited

way, the current optimizers validate their assumptions at execution time; they

may, for instance, decide to turn on sequential prefetch for an index or a table

after observing the page access pattern. In the future, the best optimizers may

also ascertain the real filter factors at execution time.

With a simple SELECT, the most important decisions to be made are the

choice of index (or indexes in the case of multiple index access) and the way

it is used (matching columns, screening columns, sequential prefetch). For joins,

the join method and table access order are also critical.

All serious relational DBMSs now have a cost-based optimizer; they identify

a number of reasonable alternatives and make cost estimates for them. The cost

estimate is largely a prediction of the local response time—a weighted average

of the CPU time and the synchronous I/O time. The estimate is based on statistics

collected by a utility, for example, the size of the tables and their indexes together

with the distribution of the values of selected columns. Many of these value

distributions are optional. The maximum set may include the following data per

column or column combination:

ž Number of distinct values ( cardinality )

ž Second largest and the second smallest value

ž N most common values and their frequencies (the number of rows with a

given value)

ž Histogram with N bars (2% below 10, 5% below 20, etc., or the percentile

values 2%, 4%, 6%, etc.)

The cost formulas of the best optimizers contain tens of variables (compared

with the two variables used in the QUBE—TR and TS). These variables may

include hardware information as well, such as processor speed, disk drive speed,

and the size of the database buffer pool. The oldest cost-based optimizers have

already had their 21st birthday, and the developers have improved them continu-

ally. Yet they sometimes choose a totally wrong access path, even for an innocent

looking SQL statement. We will now discuss why this is so.

Essentially, there are two basic problems that we have to learn to live with;

we will deal with each of them in turn:

1. Optimizers do not always see the best alternative.

2. Optimizers' cost estimates may be very wrong.

OPTIMIZERS DO NOT ALWAYS SEE THE BEST

ALTERNATIVE

Matching and Screening Problems

Difficult predicates , defined in Chapter 6 as those that cannot participate in defin-

ing the index slice , used to be the most common reason for optimizer problems.

This type of problem has become less frequent over the years for two reasons:

Search WWH ::

Custom Search

Home