Automated Physical Database Design - Physical Database Design

Databases Reference

In-Depth Information

were created for the table, would it be used and would it improve the performance of

the workload? To discover where additional indexing will be beneficial the definition

of a new index is added to the physical design of the table within the memory of the

system strictly for exploration purposes.

The index is not actually created, but rather the definition of the index along with

associated statistics is simulated within system memory, without any associated data.

When a query is processed the query optimizer will treat the virtual index as though it

were a real index, and include it as query execution plans are considered and evaluated.

From the point of view of the query compiler's cost-based optimization it does not mat-

ter at all that the index does not really exist. The compiler requires information about

the index (such as its key definitions and statistics) but does not require that the index

actually exists on disk. If the resulting estimated cost (calculated by the query compiler)

of the overall workload is lower than the estimated cost without the virtual index, then

the assumption is that the index has been beneficial to the workload. Naturally this

requires a special mode of execution for the query compiler where such virtual objects

can be attached and evaluated without queries actually being executed (i.e., where query

execution plans are evaluated, but data is not accessed and retrieved) similar to the

Explain processing described in Chapter 11.

In 1997 Chaudhuri and Narasayya published a landmark paper where they sug-

gested the use of the database's query optimizer to help automatically select indexes.

They proposed that the value of candidate indexes to the system workload could be esti-

mated by evaluating their impact on the access path cost, as simulated by the query

optimizer. The candidate indexes need not be fully materialized, but simply created in a

virtual sense with enough detail that the query optimizer would perceive as real within

its modeling of the cost and access path selection. Although their paper does not use the

terminology, in essence they suggested that the query optimizer be used as an “evalua-

tion function” for an optimization search.

The notion of an evaluation function is standard terminology in the artificial intel-

ligence (AI) community, and is used to evaluate the relative virtue of points considered

during the search. 3 The idea of reusing the database optimizer's cost estimations for

evaluating the cost benefit of physical design changes is based on the observation that

the query optimizer's cost modeling is sensitive to both the logical and physical design

of a database. Having a model for workload resource consumption allows us to exploit

3 Very commonly used in AI search schemes such as genetic algorithms, simulated annealing, hill

climbing, etc. In many optimization search algorithms the algorithm must have some way of eval-

uating the benefit to attribute to a current possible solution. This process is commonly referred to

as an “evaluation function.” In hill climbing this can be a measure of the slope between the current

position in the search and each of the candidate next steps. In genetic algorithms the evaluation

function measures relative “fitness” of member elements of the population, etc.

Search WWH ::

Custom Search

Home