Database Reference
In-Depth Information
Should a single-column index be dropped if the cardinality of the index column
is one?
For example, assume the system tables are being scanned to investigate the
current indexes. A single-column index is found with a cardinality of one. Should
this index be dropped?
It is certainly worth while spending some time considering this situation.
Our comments will be found on the ftp site for this topic. See Preface for details.
Finding an access path in a pool is much faster than repeating the access
path selection process with cost estimates. Current hardware makes an SQL pool
of several gigabytes feasible. If each SQL statement together with its access
path takes a few kilobytes, the pool may have space for a million access paths.
This is of critical importance when using a system like SAP or Peoplesoft, not
because of skewed distributions but because without reuse of saved access paths,
every SELECT would incur the overhead of estimating the cost of alternative
access paths.
Correlated Columns
WHERE MAKE = :MAKE AND MODEL = :MODEL
The optimizer is able to produce a good estimate for this compound predicate
only if it knows the cardinality of the column combination MAKE, MODEL.
Some products (e.g., DB2 for z/OS) provide an option to determine the cardinality
of N first index columns such as CARD(MAKE, MODEL) in index (MAKE,
MODEL, YEAR) in the utility which updates the optimizer's statistics. Other
products (e.g., SQL Server 2000) collect this information (SQL Server term:
density) automatically. Therefore, if columns MAKE and MODEL are the first
two columns in a multicolumn index, the optimizer may be able to ascertain the
correct filter factor for the compound predicate with highly correlated column
value distributions; remember that only Ford made Model Ts!
If the optimizer does not know CARD(MAKE, MODEL), the cardinality
of the concatenation of the first two index columns, it will assume a very low
cardinality of
1 / CARD (MAKE) × 1 / CARD (MODEL)
If the optimizer does not know CARD(MODEL), it may use a general default
cardinality (e.g., 25) or, if it knows CARD(fullkey), the cardinality of the whole
index key, it may interpolate between CARD(MAKE) and CARD(fullkey); for
example, if CARD(MAKE)
2000,
an optimizer might use 1025 (the average of 50 and 2000) as an estimate for
CARD(MAKE,MODEL). Both of these approaches can obviously lead to a filter
factor estimate that is very wrong.
Mark Gurry (2 p. 36) says that this is a common problem, according to his
Oracle experience:
=
50 and CARD(MAKE, MODEL,YEAR)
=
Search WWH ::




Custom Search