Query Optimization in Relational Database Systems - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

R and

R − R . For each pair of table subsets, line 6 obtains the best plan

that joins tables in

R − R . For that purpose, the proce-

dure bestJoinAlternative tries different join implementations (e.g., hash-,

merge-, and index-based joins). Note that the input to bestJoinAlternative

is bestPlan(

R and tables in

R − R ) , therefore leveraging the principle of

optimality. Lines 7 and 8 update bestPlan for the current subset R based on

the cost model (note that suboptimal plans are discarded and never consid-

ered again). Finally, line 9 returns the best plan for the whole set of tables,

which corresponds to the optimal join reordering for the input query. We

complement the description of the dynamic programming algorithm for join

reordering with a couple of important refinements.

First, consider a query that represents the join among

R ) and bestPlan(

{

R 1 ,

R 2 ,

R 3 }

with

the predicates R 1 .

R 2 .

R 3 .

a . Let us also assume that the cost of a

hash-based join plan for R 1

R 2 is smaller than that of a merge-based join

alternative. In such a case, bestPlan( {

R 2 } ) would keep only the hash-

based join alternative discarding the suboptimal merge-based join. However,

note that if the merge-based join alternative is used in R 1

R 1 ,

R 2 , the result of

the join is sorted on columns R 1 .

a . The sorted order may significantly

reduce the cost of the subsequent join with R 3 . Thus, pruning the merge join

alternative for R 1

a and R 2 .

R 2 can result in suboptimality of the global plan. The

problem arises because the result of the merge join between R 1 and R 2 has

an ordering of tuples in the output stream that is useful in a subsequent

join. However, the hash-based join alternative, while cheaper locally, does

not have such ordering. To address this problem, the dynamic programming

algorithm is extended to keep in bestPlan an optimal plan for every choice

of an “interesting” sort order. Two plans are then compared if they represent

the same expression and have the same interesting order.

Second, note that the algorithm in Figure 2.6 enumerates all bushy plans

(including those that perform early cartesian products). The algorithm can

easily be adapted to enumerate through different search spaces by modifying

line 5. For instance, if we want to consider only (left) linear join trees, line 5

should be changed to:

R ⊂ R such that | R |=1

foreach

Cartesian products can be avoided until all joins have been processed by fur-

ther modifying line 5. Of course, the complexity of the algorithm and quality

of resulting plans strongly depend on the search space that is enumerated.

2.3.2 Starburst

The Starburst project at IBM Almaden Research Center built an extensible

relational DBMS. Query optimization in Starburst begins with a structural

representation of the SQL query that is used throughout the optimization

process. This representation is called the Query Graph Model (QGM). In the

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home