Databases Reference
In-Depth Information
The selectivity of an index has probably the biggest impact on whether the database will use an
index or not. Selectivity refers to how many distinct values there are for each column value. If an index is
very selective, it has few rows for each index entry. If an index is unselective, on the other hand, there are
many rows for each index entry.
Do remember that the optimizer multiplies the selectivity of the column with the leaf_blocks
statistics to get an estimate of the index blocks the database must access during an index read. While it's
true that most of the time highly selective columns will indeed use an index, it isn't always the case since
the ultimate deciding factor in the choice of an index versus full table scans is block selectivity.
In the Oracle Database 10g release (but not in the 11g release), the use of the
DBMS_STATS.AUTO_SAMPLE_SIZE constant to get the estimates of the number of rows to be used by the
DBMS_STATS package may lead to wrong estimates of the number of distinct values (NDV). This is often
true if the table is large and there is a significant amount of skewness in the data. You're better off using
your own estimate of the sample size in order to get a more accurate value for the value of the NDV.
Index Clustering Factor
You can find the value of an index's clustering factor by querying the CLUSTERING_FACTOR column from
the DBA_INDEXES view. The clustering factor tells you how well ordered a table's rows are in comparison
with an index's rows. If the clustering factor is close to the number of rows in a table, the rows are likely
to be more randomly ordered and it's less likely that the index entries in an index block are located in the
same data block. The table's selectivity (with filtering) multiplied by the index clustering factor
determines the cost of the table access by index. This is actually the table selectivity used in this portion
of the calculation. Although most of the time, the index selectivity (multiplied by leaf_blocks and
discussed in the previous section) and table selectivity are the same, it is actually a computation
intended to determine how many table data blocks will need to be accessed. Often, the index clustering
factor is assumed to be a guarantee of randomness, but it isn't. For example, what about the case where
each row in the index refers to only two distinct blocks, but the entries are ROWID ordered as block 1,
block 2, block 1, block 2, and so on? If there were 10,000 entries in the index, the clustering factor would
be 10,000 but in reality only 2 blocks would be accessed. Therefore, the computation for the clustering
factor isn't guaranteed to prove randomness in quite the same way that most people may think of it.
In an index with a “good” clustering factor, the index values in a specific index leaf block point to
rows distributed in the same data blocks. On the other hand, in an index with a 'bad” clustering factor,
index values in a specific leaf block point to multiple data blocks. A well organized index structure has a
good clustering factor and it can read data with fewer I/Os. A poorly organized index with the same
amount of data will require a much larger number of I/Os to read the data from a larger set of data
blocks. In this context, it's crucial to note that only one index per table is likely to be ideally organized
and that is actually only true if the table data was loaded in a specific order. For instance, perhaps the
table was loaded by a column such as order_date . In that case, the clustering factor for the index on
order_date will be almost identical to the number of blocks in the table. But, since the table can only be
present in one order, all other indexes will be “less optimally” ordered. So, in the end, “good” and “bad”
become more relative and less absolute. This is probably one of the reasons why the cost calculation for
an index use is comprised of multiple elements that include both table and index selectivity so that one
component doesn't get an extremely heavy weighting.
A rule of thumb is that a good clustering factor is closer to the number of blocks in a table and a poor
clustering factor is closer to the number or rows in a table.
In addition to the two factors discussed here, the value of the multiblock read count has a bearing
on the usage of indexes. The higher the value of the multiblock read count
( DB_FILE_MULTIBLOCK_READ_COUNT ), the lower the cost of a full table scan from the optimizer's point of
view.
 
Search WWH ::




Custom Search