Database Reference
In-Depth Information
set of candidate indexes for each query in the workload. Different techniques
are possible, ranging from purely syntactic ones that examine the query string
to more sophisticated ones that leverage the query optimizer itself. Once we
characterize the set of candidates for each query, the second step is to de-
fine the search space for the workload. Some earlier techniques restrict such
space to simply the union of candidate indexes for each query, where more
recent approaches also consider additional indexes that result from combining
candidate indexes from different queries. In Chapter 4 we present a detailed
discussion of all these alternatives.
3.2.2 Cost Model
To perform any sort of meaningful traversal over the search space of index
configurations, we need the ability to calculate both the size of any given con-
figuration C as well as the cost of evaluating the workload under C . Estimating
the size of a configuration is not dicult, since it is essentially the sum of sizes
of each index in the configuration. This can be calculated based on statistics
of the corresponding base table (i.e., number of tuples and tuple length).
Calculating the cost of evaluating the workload under an arbitrary config-
uration is more challenging. The most accurate procedure involves creating
the indexes in the configuration, optimizing all queries in the workload, and
adding the estimated cost for all queries. While optimizing queries is relatively
cheap, creating the indexes in the configuration is prohibitively expensive, as
it requires repeatedly scanning and sorting tables in the database.
An early alternative to the accurate but expensive approach described al-
ready involved creating a cost model outside of the optimizer and predicting
how query costs would vary with changes to the physical design. These models
ranged from crude to sophisticated, in which histograms were built and used
to compute selectivity of predicates and ultimately costs of queries. These
approaches worked reasonably well when query engines were rather simple
and workloads consisted of short OLTP queries. With modern query engines
and complex decision support queries, however, these approaches quickly be-
came impractical. The main problem is that indexes are useful only if the
optimizer chooses a plan that leverages them. Even if an index might seem
a great candidate, it would be irrelevant if the optimizer does not choose (or
even considers) the corresponding execution plan. Therefore, it is not a good
idea to second-guess the optimizer.
These issues result in an interesting conflict. On one hand, we need to be in
sync with the optimizer to decide the cost of a workload under a given config-
uration. On the other hand, it seems that we cannot use the optimizer short of
actually performing very expensive index materializations. A subsequent line
of work solved this problem by implementing what is called a what-if optimiza-
tion layer , which obtains estimated execution costs of plans under arbitrary
configurations without actually materializing indexes. A detailed discussion
on this component is the subject of Chapter 5.
Search WWH ::




Custom Search