Database Reference
In-Depth Information
A cost-based optimizer does not make its decision based on the number of
matching columns. It makes cost estimates for several feasible alternatives. In
this case, when it considers the new index for SQLB, it needs to know the filter
factor for the compound predicate C1
:C3.
If the actual cardinality for column group (C1,C3) is 500, the average filter
factor of the matching predicates will be 1 / 500 = 0 . 2%. The number of index
and table touches is then 0
=
:C1 AND C3
=
4000 for each. The worst input
may result in many more touches. Likewise, if the actual cardinality for column
group (C1,C2,C3,C4,C5,C6,C7,C8), the first eight columns of the primary key,
is 20,000, the average filter factor for the whole WHERE clause is 1/20,000, and
the number of touches (because all eight simple predicates are matching) is only
100 + 100. If the optimizer knew all this, it would not use the new index.
What does the optimizer know? It depends on the product and the options
chosen; but, often, by default, the optimizer only knows the cardinality of the
first index key (new index: C1), together with the cardinality of the whole index
key (new index: C1,C3,C10). When the optimizer needs to know the cardinality
of (C1,C3), it must use interpolation between these two values. The interpolation
formulas tend to be confidential, but it is not necessary to know the details in
order to appreciate that the optimizer's estimates can be very wrong in a case
like this. The optimizer needs help.
Many DBMS products provide an option in the utility that builds the statistics
for the optimizer (RUNSTATS, ANALYZE, or equivalent) to collect information
about partial index keys, such as cardinality, histograms, or the most common
and most rare values. It may be wise to choose the cardinality option for all
partial keys of all indexes, at the very least. Requesting statistics options only for
some indexes, columns, or column groups sometimes makes the situation worse;
it may be preferable for the optimizer to be wrong in a consistent way, rather
than knowing a lot about some objects while using the default values for others.
We encountered one case, for example, where an optimizer chose a wrong index
because, even though it knew that the cardinality of CNO, a foreign key, was
5,000,000, it did not know the cardinality of CNAME, and therefore used the
default cardinality, 25. The optimizer would have made the right choice if it had
used the default cardinality for both CNO and CNAME.
But then, in real life, it is not easy to follow the rather obvious recommenda-
tion of equal treatment for optional statistics. Detailed statistics are often created
for a column because the default filter factors result in a wrong access path. Is
it possible, then, to do the same for all related columns? Maybe not. It would
be preferable for the DBMS to make intelligent recommendations about optional
statistics. After all, the optimizer knows when it must resort to a default filter
factor. The optimizer could also monitor the real filter factor and compare it with
its assumption. The most advanced products are already taking the first steps in
this direction (e.g., DB2 for LUW 8.2 and Oracle 10 g). Any improvement in
this area would be very welcome because the need for hints would be reduced.
It is better to address the cause than the symptoms.
.
002
×
2
,
000
,
000
=
Search WWH ::




Custom Search