Database Reference
In-Depth Information
The second question is what exactly constitutes the optimizing function.
The problem statement asks for “executing queries in W as eciently as pos-
sible.” While this is what ultimately matters in a production system, it in-
troduces some technical di culties. First, it requires us to measure actual
execution times of queries (which could take a long time) and updates (which
would change the database system) just to evaluate the quality of a given
configuration or compare two alternative configurations. Additionally, it re-
quires that the system executing queries is free from interference from other
concurrent database queries or even from operating system processes. Since
this is almost impossible to achieve in practice, execution times are usually
nondeterministic, and it becomes dicult to reason with such an optimizing
function. An alternative approach, which does not require executing queries,
is to model the cost it would take a query to execute under a given configura-
tion. If the model accurately predicts the execution time of a query without
actually executing it , it can serve as a proxy for the actual metric, and all
the previous problems no longer apply. As discussed in Chapter 2, the query
optimizer models execution costs of queries as part of its own processing.
This means that, associated with each execution plan, there is a notion of
how costly is the plan in optimizer units. It could be argued that, in order to
correctly distinguish among various execution plans, the cost model of the op-
timizer needs to be strongly correlated with actual execution costs (of course,
the optimizer cost units need not be seconds but abstract units of work). We
therefore define the eciency of a query in terms of its estimated cost as given
by the query optimizer.
The final question is how to aggregate individual query costs. The simplest
and most widely used metric adds together the estimated cost of all queries
and thus minimizes the total workload cost. Other alternatives include mini-
mizing the cost of the slowest query, or the average improvement ratio (i.e.,
the average of the cost of each query under the final configuration divided by
the cost of the query under the current configuration).
We are now ready to present a refined version of the physical design problem,
which includes all aspects of the previous discussion:
(Refined) Physical design problem: Given a workload W ={ Q 1 ,... , Q n } ,
where each Q i is a SQL query or update, and a storage bound B , obtain the
index configuration C
such that (1) j size( I j ) B , and (2)
={ I 1 ,... , I k }
j cost( Q j , C ) is minimized, where cost ( Q , C ) is the cost of the optimal plan
for Q found by the optimizer when all and only indexes in C are available.
3.1 The Complexity of the Physical Design Problem
In this section we analyze the complexity of the physical design problem as
defined previously. We first review some of the many rules of thumb that are
widely applied in database installations and discuss why a more systematic
Search WWH ::




Custom Search