New Challenges in Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

are, although important, less relevant. The reason is that using execu-

tion costs potentially introduces additional variables that are outside

the scope of the evaluated tool. When purely evaluating the quality of a

physical design tuner , we should be careful to freeze any external vari-

ables. It is therefore reasonable to assume that the optimizer is correct

and the physical design tool exploits accurate information. Using the op-

timizer's expected cost rather that the actual execution cost of queries

has precisely that effect, provided that the optimizer is operating under

the same statistical model for all configurations.

It is important to note the we can execute only what the optimizer

considers is the best plan for a given query. Consider the following ex-

ample:

SELECT R.*

FROM R, S

WHERE predicate(R) AND R.x=S.y

and suppose that the optimizer estimates that only a handful of tu-

ples from R satisfy predicate(R) . If an index on S

y is available, the

optimizer would find that an index-based join that first gets all valid

tuples from R and then looks up the matches from S might be a bet-

ter alternative than, say, a hash join. Now suppose that the estimate is

not right due to limitations in the optimizer's cost model and in real-

ity that almost all tuples in R satisfy predicate(R) . In this case, the

index-based join plan, although it is costed the lowest by the optimizer

and therefore chosen if possible, would execute much slower than the

suboptimal (to the eyes of the optimizer) hash-based join alternative.

Now the problem is clear. Consider the previous query under the base

configuration (denoted C 0 ) and the one that includes all single-column

indexes (denoted C 1 ). The optimizer would pick the hash-based join-

based alternative under C 0 (because there is no index on S

.

y in C 0 ) and

the index-based join alternative under C 1 (because the index is present).

The net effect is that the execution cost under C 1 would be significantly

worse than that under C 0 , and we would tend to rank the tuner that

produced C 0 higher than the one that produced C 1 . However, note that

under C 1 the optimizer considered the hash-based join alternative but

discarded it in favor of the index-based join plan! In fact, within the

optimizer's cost model, the index-based join alternative is better than

the hash-based join alternative in both C 1 and C 0 (although the former

plan is not implementable under C 0 ).

Consequences of “tweaking” metrics: It is sometimes tempting to in-

troduce small tweaks to the evaluation metric to address special corner

cases. This, however, has to be done with much care to avoid unintended

consequences. We illustrate this issue with an example taken from the

M

.

metric. As explained earlier, the

M

metric optionally relies on a

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home