Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

The second question is what exactly constitutes the optimizing function.

The problem statement asks for “executing queries in W as eciently as pos-

sible.” While this is what ultimately matters in a production system, it in-

troduces some technical di culties. First, it requires us to measure actual

execution times of queries (which could take a long time) and updates (which

would change the database system) just to evaluate the quality of a given

configuration or compare two alternative configurations. Additionally, it re-

quires that the system executing queries is free from interference from other

concurrent database queries or even from operating system processes. Since

this is almost impossible to achieve in practice, execution times are usually

nondeterministic, and it becomes dicult to reason with such an optimizing

function. An alternative approach, which does not require executing queries,

is to model the cost it would take a query to execute under a given configura-

tion. If the model accurately predicts the execution time of a query without

actually executing it , it can serve as a proxy for the actual metric, and all

the previous problems no longer apply. As discussed in Chapter 2, the query

optimizer models execution costs of queries as part of its own processing.

This means that, associated with each execution plan, there is a notion of

how costly is the plan in optimizer units. It could be argued that, in order to

correctly distinguish among various execution plans, the cost model of the op-

timizer needs to be strongly correlated with actual execution costs (of course,

the optimizer cost units need not be seconds but abstract units of work). We

therefore define the eciency of a query in terms of its estimated cost as given

by the query optimizer.

The final question is how to aggregate individual query costs. The simplest

and most widely used metric adds together the estimated cost of all queries

and thus minimizes the total workload cost. Other alternatives include mini-

mizing the cost of the slowest query, or the average improvement ratio (i.e.,

the average of the cost of each query under the final configuration divided by

the cost of the query under the current configuration).

We are now ready to present a refined version of the physical design problem,

which includes all aspects of the previous discussion:

(Refined) Physical design problem: Given a workload W ={ Q 1 ,... , Q n } ,

where each Q i is a SQL query or update, and a storage bound B , obtain the

index configuration C

such that (1) j size( I j ) ≤ B , and (2)

={ I 1 ,... , I k }

j cost( Q j , C ) is minimized, where cost ( Q , C ) is the cost of the optimal plan

for Q found by the optimizer when all and only indexes in C are available.

3.1 The Complexity of the Physical Design Problem

In this section we analyze the complexity of the physical design problem as

defined previously. We first review some of the many rules of thumb that are

widely applied in database installations and discuss why a more systematic

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home