Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

set of candidate indexes for each query in the workload. Different techniques

are possible, ranging from purely syntactic ones that examine the query string

to more sophisticated ones that leverage the query optimizer itself. Once we

characterize the set of candidates for each query, the second step is to de-

fine the search space for the workload. Some earlier techniques restrict such

space to simply the union of candidate indexes for each query, where more

recent approaches also consider additional indexes that result from combining

candidate indexes from different queries. In Chapter 4 we present a detailed

discussion of all these alternatives.

3.2.2 Cost Model

To perform any sort of meaningful traversal over the search space of index

configurations, we need the ability to calculate both the size of any given con-

figuration C as well as the cost of evaluating the workload under C . Estimating

the size of a configuration is not dicult, since it is essentially the sum of sizes

of each index in the configuration. This can be calculated based on statistics

of the corresponding base table (i.e., number of tuples and tuple length).

Calculating the cost of evaluating the workload under an arbitrary config-

uration is more challenging. The most accurate procedure involves creating

the indexes in the configuration, optimizing all queries in the workload, and

adding the estimated cost for all queries. While optimizing queries is relatively

cheap, creating the indexes in the configuration is prohibitively expensive, as

it requires repeatedly scanning and sorting tables in the database.

An early alternative to the accurate but expensive approach described al-

ready involved creating a cost model outside of the optimizer and predicting

how query costs would vary with changes to the physical design. These models

ranged from crude to sophisticated, in which histograms were built and used

to compute selectivity of predicates and ultimately costs of queries. These

approaches worked reasonably well when query engines were rather simple

and workloads consisted of short OLTP queries. With modern query engines

and complex decision support queries, however, these approaches quickly be-

came impractical. The main problem is that indexes are useful only if the

optimizer chooses a plan that leverages them. Even if an index might seem

a great candidate, it would be irrelevant if the optimizer does not choose (or

even considers) the corresponding execution plan. Therefore, it is not a good

idea to second-guess the optimizer.

These issues result in an interesting conflict. On one hand, we need to be in

sync with the optimizer to decide the cost of a workload under a given config-

uration. On the other hand, it seems that we cannot use the optimizer short of

actually performing very expensive index materializations. A subsequent line

of work solved this problem by implementing what is called a what-if optimiza-

tion layer , which obtains estimated execution costs of plans under arbitrary

configurations without actually materializing indexes. A detailed discussion

on this component is the subject of Chapter 5.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home