Database Reference
In-Depth Information
indexes without making any measurements; the current optimizers do not issue
any SELECT calls to the database when choosing the access path. In a limited
way, the current optimizers validate their assumptions at execution time; they
may, for instance, decide to turn on sequential prefetch for an index or a table
after observing the page access pattern. In the future, the best optimizers may
also ascertain the real filter factors at execution time.
With a simple SELECT, the most important decisions to be made are the
choice of index (or indexes in the case of multiple index access) and the way
it is used (matching columns, screening columns, sequential prefetch). For joins,
the join method and table access order are also critical.
All serious relational DBMSs now have a cost-based optimizer; they identify
a number of reasonable alternatives and make cost estimates for them. The cost
estimate is largely a prediction of the local response time—a weighted average
of the CPU time and the synchronous I/O time. The estimate is based on statistics
collected by a utility, for example, the size of the tables and their indexes together
with the distribution of the values of selected columns. Many of these value
distributions are optional. The maximum set may include the following data per
column or column combination:
ž Number of distinct values ( cardinality )
ž Second largest and the second smallest value
ž N most common values and their frequencies (the number of rows with a
given value)
ž Histogram with N bars (2% below 10, 5% below 20, etc., or the percentile
values 2%, 4%, 6%, etc.)
The cost formulas of the best optimizers contain tens of variables (compared
with the two variables used in the QUBE—TR and TS). These variables may
include hardware information as well, such as processor speed, disk drive speed,
and the size of the database buffer pool. The oldest cost-based optimizers have
already had their 21st birthday, and the developers have improved them continu-
ally. Yet they sometimes choose a totally wrong access path, even for an innocent
looking SQL statement. We will now discuss why this is so.
Essentially, there are two basic problems that we have to learn to live with;
we will deal with each of them in turn:
1. Optimizers do not always see the best alternative.
2. Optimizers' cost estimates may be very wrong.
OPTIMIZERS DO NOT ALWAYS SEE THE BEST
ALTERNATIVE
Matching and Screening Problems
Difficult predicates , defined in Chapter 6 as those that cannot participate in defin-
ing the index slice , used to be the most common reason for optimizer problems.
This type of problem has become less frequent over the years for two reasons:
Search WWH ::




Custom Search