Designing a Cost Model - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

we optimize q under C 2 and obtain plan P 2 and moreover that the internal

subplan IP 2 for P 2 is different from IP 1 (see again Figure 5.7). Note that

both IP 1 and IP 2 are just different join reorderings of the same query. As

such, we can instantiate all index-based subplans from P 1 (i.e., R 1 , S 1 , and T 1

in the figure) in the internal subplan IP 2 and obtain a valid plan for q under

configuration C 1 . Since the optimizer did not return such a plan for q under

C 1 , and the cost of R 1 , S 1 , and T 1 is independent of the internal subplan they

belong to, we can conclude that the cost of IP 1 is smaller than that of IP 2 .

But then we can construct a plan that uses IP 1 and index subplans R 2 , S 2 ,

and T 2 from P 2 . This plan, called P 2 , is valid for q under configuration C 2 .

Additionally, it is more ecient than P 2 because, as we have shown before, IP 1

is more ecient than IP 2 , and R 2 , S 2 , and T 2 have the same cost in both P 2 and

P 2 . We conclude then that P 2 is not optimal for C 2 , which is a contradiction.

Therefore, every optimal plan in this restricted scenario must contain the same

internal subplan, and the upper-bound technique of the previous section would

return accurate values of cost

for arbitrary configurations.

Of course, the previous example relies on a drastically simplified query

processor. In reality, indexes on join columns can result in plans that use

merge- or index-based joins. Therefore, a better execution plan that is based

on a different join order would be missed by the algorithm in Figure 5.5. INUM

is a technique that extends the upper-bound approach of the previous section

in the following way. During a preprocessing step, several optimization calls

are issued for a given query until the resulting inner subplans are sucient

to infer execution plans for as many configurations as possible. The resulting

inner subplans constitute INUM's space. Then, INUM considers the inner

subplans in its space as the input CS in Figure 5.5 (with some additional

improvements to avoid considering each inner subplan for every call).

The set of optimization calls required in the preprocessing step of a given

query depends on the capabilities of the query engine. In the trivial (and

unrealistic) case described in Figure 5.7, a single optimization call would be

enough to obtain the internal subplan needed to infer optimal plans for ar-

bitrary configurations (using hash-based joins only). When we additionally

consider merge joins, INUM needs to optimize query q for configurations that

include indexes for each combination of interesting orders of q . Interesting

orders are derived from columns that are mentioned in a join predicate or a

group-by clause. Consider a query q referencing tables T 1 ,... ,

(

)

T n and let O i

be the set of interesting orders for table T i . The set O

O n con-

tains all possible combinations of interesting orders that a configuration can

cover. Making some general assumption on the optimizer cost model, it can be

shown that for every member of O there exists a single optimal inner subplan

using hash- or merge-based joins. Thus, to compute the INUM space in this

case, it is enough to optimize the query once for each member of O using

some representative configuration. Although in the worst case this technique

requires an exponential number of optimization calls in the number of query

tables, there exist heuristics to reduce the number of optimization calls.

O 1 × ... ×

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home