Optimizers Are Not Perfect - Relational Database Index Design and the Optimizers

Database Reference

In-Depth Information

Should a single-column index be dropped if the cardinality of the index column

is one?

For example, assume the system tables are being scanned to investigate the

current indexes. A single-column index is found with a cardinality of one. Should

this index be dropped?

It is certainly worth while spending some time considering this situation.

Our comments will be found on the ftp site for this topic. See Preface for details.

Finding an access path in a pool is much faster than repeating the access

path selection process with cost estimates. Current hardware makes an SQL pool

of several gigabytes feasible. If each SQL statement together with its access

path takes a few kilobytes, the pool may have space for a million access paths.

This is of critical importance when using a system like SAP or Peoplesoft, not

because of skewed distributions but because without reuse of saved access paths,

every SELECT would incur the overhead of estimating the cost of alternative

access paths.

Correlated Columns

WHERE MAKE = :MAKE AND MODEL = :MODEL

The optimizer is able to produce a good estimate for this compound predicate

only if it knows the cardinality of the column combination MAKE, MODEL.

Some products (e.g., DB2 for z/OS) provide an option to determine the cardinality

of N first index columns such as CARD(MAKE, MODEL) in index (MAKE,

MODEL, YEAR) in the utility which updates the optimizer's statistics. Other

products (e.g., SQL Server 2000) collect this information (SQL Server term:

density) automatically. Therefore, if columns MAKE and MODEL are the first

two columns in a multicolumn index, the optimizer may be able to ascertain the

correct filter factor for the compound predicate with highly correlated column

value distributions; remember that only Ford made Model Ts!

If the optimizer does not know CARD(MAKE, MODEL), the cardinality

of the concatenation of the first two index columns, it will assume a very low

cardinality of

1 / CARD (MAKE) × 1 / CARD (MODEL)

If the optimizer does not know CARD(MODEL), it may use a general default

cardinality (e.g., 25) or, if it knows CARD(fullkey), the cardinality of the whole

index key, it may interpolate between CARD(MAKE) and CARD(fullkey); for

example, if CARD(MAKE)

2000,

an optimizer might use 1025 (the average of 50 and 2000) as an estimate for

CARD(MAKE,MODEL). Both of these approaches can obviously lead to a filter

factor estimate that is very wrong.

Mark Gurry (2 p. 36) says that this is a common problem, according to his

Oracle experience:

=

50 and CARD(MAKE, MODEL,YEAR)

=

Relational Database Index Design and the Optimizers

Search WWH ::

Custom Search

Home