Designing a Cost Model - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

When additionally considering index-based joins, the analysis of INUM be-

comes more complex. The problem is that some properties in the cost model

of hash- and merge-based join operators are not applicable when considering

index-based joins. For that reason, INUM uses a simplified cost model and

additional optimization calls to be able to handle index-based joins. Although

the number of optimization calls in the preprocessing step might be high,

INUM is able to perform very quick reoptimizations afterward and results in

performance improvements during tuning sessions that explore large numbers

of configurations.

5.4 Configuration-Parametric Query

Optimization (CPQO)

The INUM technique described earlier can be used to infer costs of queries

under arbitrary configurations at a fraction of the cost of a what-if optimiza-

tion call. However, INUM is not fully integrated with the query optimizer.

For that reason, it is not clear how to extend the approach for more complex

execution plans (such as correlated subqueries). INUM also relies on some

assumptions to reduce the number of regular optimization calls per query in

the precomputation phase (which could be exponential in the number of ta-

bles in the worst case). As an example, for TPC-H query 15, INUM requires

over a thousand regular optimization calls for simpler variations of the input

query before it can start optimizing arbitrary configurations. In this section

we present a technique called configuration-parametric query optimization ,or

CPQO for short. The idea is to issue a single optimization call per query

(possibly with a larger overhead than that of a regular optimization call) and

obtain back a compact representation of all possible internal subplans that

allows us to eciently generate execution plans for arbitrary configurations.

Then, the modest overhead during the first (and only) optimization call is

amortized when the same query is reoptimized for different configurations.

We next introduce CPQO in the context of a transformation-based optimizer

such as Cascades (see Chapter 2 for details).

Figure 5.8 shows a high-level, conceptual illustration of CPQO . Consider

the final MEMO structure after optimizing an input query q . If we consider only

physical operators, this final MEMO is a directed acyclic graph, where each node

is a physical operator and edges going out of a node G connect G with its best

children. The rule set in a transformation-based optimizer contains a small

subset that deals with access path selection (see Section 2.3.3.3). Intuitively,

these are the few rules that might make a difference when optimizing the

same query under different configurations. Now suppose that we identify the

subgraphs in the MEMO that were produced by a rule that deals with access

path selection (e.g., P 1 ,... ,

P 4 in Figure 5.8a). Then, everything that is above

Search WWH ::

Custom Search

Home