Databases Reference
In-Depth Information
possible, meaning that it's almost the same as the number of rows in the table (10,000,000). With each of
the table's 100,000 data blocks containing 100 rows, the database selects 10 rows, or 1 percent from each
data block. Thus, the total cost of accessing the table data is going to be the selectivity times the
clustering factor, which is 0.1* 10,000,000, which comes to 100,000. So the approximate cost of the index-
based read is 202 index block accesses plus 100,000 table block accesses, for a grand total cost of 100,202
blocks altogether.
When it comes to the full table scan costs, remember that unlike an index read, which is always
done in single block I/Os, a full table scan uses multiblock reads. Since this example assumes that each
table block contains 100 rows, the database needs to scan roughly 100,000 data blocks during a full table
scan. Let's assume the multi_block_read_count value is set at 10. The database will then have to perform
a total of 100,000/10; that is 10000 reads. If you want to be more precise, you can also add the read of the
segment header block to the total, making it 10,001.
Clearly the full table scan in this case is much cheaper (10,001 block reads compared to 100,202
block reads for the index access), even after assuming the worst possible index clustering factor. In
addition, you can parallelize this full table scan, making it perform even faster. Note that in the previous
example, the query is fetching only 1 percent of the data in a large table, yet the full table scan is much
cheaper. This is a simple demonstration to show that the choice of the index or full table scan by the
optimizer doesn't always depend on the percentage of rows a query must retrieve. Rather, it depends on
critical factors such as the data distribution, the number of table and leaf blocks, the average number of
rows in a table block, the average number of leaf entries in an index leaf block, the index clustering
factor, and the size of the multiblock read count, There simply is no magic percentage of rows that a
query must retrieve, such as 1, 5, 10, 25 or 50 percent, that by itself tells the optimizer it must use an
index or a full table scan. This simple example here shows that the optimizer is likely to go for a full table
scan even when a query retrieves a very small percent (1 percent) of a table's rows.
Why the Optimizer May Ignore Indexes
Creating an index is never a guarantee that the optimizer will use the index when evaluating an
execution plan. If a query is selecting a high percentage of rows from a table, the optimizer might
determine that it can get the results faster through a full table scan instead of an index scan. Remember
that when the database uses an index first, it looks up the index to obtain the ROWIDs and then uses those
ROWIDs to retrieve the requested rows. If a query selects a large percentage of rows from a table, leading
to a large percentage of the table's blocks being read, the database might perform a full table scan to
avoid reading both the index and the table, which could end up being more expensive than just scanning
the table once.
Many factors determine the usage of an index by the optimizer, as the following sections in this
chapter explain.
Number of Distinct Rows
A crucial factor that determines the choice between a full table scan and an index scan is the number of
distinct rows in a table matching a given query predicate vis-à-vis the number of total rows in the table.
You can find the number of rows in a table by querying the NUM_ROWS column in the DBA_TABLES view. You
can similarly find the number of distinct values in any column by querying the NUM_DISTINCT column
from the DBA_TAB_COLUMNS view. The closer the value of the NUM_DISTINCT column to the NUM_ROWS
column, the more likely it is for the optimizer to prefer accessing an index on that column to performing
a full table scan; in other words, the more selective an index, the more likely the database is to use it.
 
Search WWH ::




Custom Search