Tuning Index Usage - Expert Indexing in Oracle Database 11g

Databases Reference

In-Depth Information

The selectivity of an index has probably the biggest impact on whether the database will use an

index or not. Selectivity refers to how many distinct values there are for each column value. If an index is

very selective, it has few rows for each index entry. If an index is unselective, on the other hand, there are

many rows for each index entry.

Do remember that the optimizer multiplies the selectivity of the column with the leaf_blocks

statistics to get an estimate of the index blocks the database must access during an index read. While it's

true that most of the time highly selective columns will indeed use an index, it isn't always the case since

the ultimate deciding factor in the choice of an index versus full table scans is block selectivity.

In the Oracle Database 10g release (but not in the 11g release), the use of the

DBMS_STATS.AUTO_SAMPLE_SIZE constant to get the estimates of the number of rows to be used by the

DBMS_STATS package may lead to wrong estimates of the number of distinct values (NDV). This is often

true if the table is large and there is a significant amount of skewness in the data. You're better off using

your own estimate of the sample size in order to get a more accurate value for the value of the NDV.

Index Clustering Factor

You can find the value of an index's clustering factor by querying the CLUSTERING_FACTOR column from

the DBA_INDEXES view. The clustering factor tells you how well ordered a table's rows are in comparison

with an index's rows. If the clustering factor is close to the number of rows in a table, the rows are likely

to be more randomly ordered and it's less likely that the index entries in an index block are located in the

same data block. The table's selectivity (with filtering) multiplied by the index clustering factor

determines the cost of the table access by index. This is actually the table selectivity used in this portion

of the calculation. Although most of the time, the index selectivity (multiplied by leaf_blocks and

discussed in the previous section) and table selectivity are the same, it is actually a computation

intended to determine how many table data blocks will need to be accessed. Often, the index clustering

factor is assumed to be a guarantee of randomness, but it isn't. For example, what about the case where

each row in the index refers to only two distinct blocks, but the entries are ROWID ordered as block 1,

block 2, block 1, block 2, and so on? If there were 10,000 entries in the index, the clustering factor would

be 10,000 but in reality only 2 blocks would be accessed. Therefore, the computation for the clustering

factor isn't guaranteed to prove randomness in quite the same way that most people may think of it.

In an index with a “good” clustering factor, the index values in a specific index leaf block point to

rows distributed in the same data blocks. On the other hand, in an index with a 'bad” clustering factor,

index values in a specific leaf block point to multiple data blocks. A well organized index structure has a

good clustering factor and it can read data with fewer I/Os. A poorly organized index with the same

amount of data will require a much larger number of I/Os to read the data from a larger set of data

blocks. In this context, it's crucial to note that only one index per table is likely to be ideally organized

and that is actually only true if the table data was loaded in a specific order. For instance, perhaps the

table was loaded by a column such as order_date . In that case, the clustering factor for the index on

order_date will be almost identical to the number of blocks in the table. But, since the table can only be

present in one order, all other indexes will be “less optimally” ordered. So, in the end, “good” and “bad”

become more relative and less absolute. This is probably one of the reasons why the cost calculation for

an index use is comprised of multiple elements that include both table and index selectivity so that one

component doesn't get an extremely heavy weighting.

A rule of thumb is that a good clustering factor is closer to the number of blocks in a table and a poor

clustering factor is closer to the number or rows in a table.

In addition to the two factors discussed here, the value of the multiblock read count has a bearing

on the usage of indexes. The higher the value of the multiblock read count

( DB_FILE_MULTIBLOCK_READ_COUNT ), the lower the cost of a full table scan from the optimizer's point of

view.

Expert Indexing in Oracle Database 11g

Search WWH ::

Custom Search

Home