Optimizers Are Not Perfect - Relational Database Index Design and the Optimizers

Database Reference

In-Depth Information

A cost-based optimizer does not make its decision based on the number of

matching columns. It makes cost estimates for several feasible alternatives. In

this case, when it considers the new index for SQLB, it needs to know the filter

factor for the compound predicate C1

:C3.

If the actual cardinality for column group (C1,C3) is 500, the average filter

factor of the matching predicates will be 1 / 500 = 0 . 2%. The number of index

and table touches is then 0

=

:C1 AND C3

=

4000 for each. The worst input

may result in many more touches. Likewise, if the actual cardinality for column

group (C1,C2,C3,C4,C5,C6,C7,C8), the first eight columns of the primary key,

is 20,000, the average filter factor for the whole WHERE clause is 1/20,000, and

the number of touches (because all eight simple predicates are matching) is only

100 + 100. If the optimizer knew all this, it would not use the new index.

What does the optimizer know? It depends on the product and the options

chosen; but, often, by default, the optimizer only knows the cardinality of the

first index key (new index: C1), together with the cardinality of the whole index

key (new index: C1,C3,C10). When the optimizer needs to know the cardinality

of (C1,C3), it must use interpolation between these two values. The interpolation

formulas tend to be confidential, but it is not necessary to know the details in

order to appreciate that the optimizer's estimates can be very wrong in a case

like this. The optimizer needs help.

Many DBMS products provide an option in the utility that builds the statistics

for the optimizer (RUNSTATS, ANALYZE, or equivalent) to collect information

about partial index keys, such as cardinality, histograms, or the most common

and most rare values. It may be wise to choose the cardinality option for all

partial keys of all indexes, at the very least. Requesting statistics options only for

some indexes, columns, or column groups sometimes makes the situation worse;

it may be preferable for the optimizer to be wrong in a consistent way, rather

than knowing a lot about some objects while using the default values for others.

We encountered one case, for example, where an optimizer chose a wrong index

because, even though it knew that the cardinality of CNO, a foreign key, was

5,000,000, it did not know the cardinality of CNAME, and therefore used the

default cardinality, 25. The optimizer would have made the right choice if it had

used the default cardinality for both CNO and CNAME.

But then, in real life, it is not easy to follow the rather obvious recommenda-

tion of equal treatment for optional statistics. Detailed statistics are often created

for a column because the default filter factors result in a wrong access path. Is

it possible, then, to do the same for all related columns? Maybe not. It would

be preferable for the DBMS to make intelligent recommendations about optional

statistics. After all, the optimizer knows when it must resort to a default filter

factor. The optimizer could also monitor the real filter factor and compare it with

its assumption. The most advanced products are already taking the first steps in

this direction (e.g., DB2 for LUW 8.2 and Oracle 10 g). Any improvement in

this area would be very welcome because the need for hints would be reduced.

It is better to address the cause than the symptoms.

.

002

×

2

,

000

,

000

=

Relational Database Index Design and the Optimizers

Search WWH ::

Custom Search

Home