Query Optimization in Relational Database Systems - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

be more than one). Third, the resulting expressions obtained by apply-

ing the rules are integrated back into the MEMO , possibly creating new,

unexplored groups (as an example, applying the join associativity rule

to expression 5.1 in Figure 2.7b results in groupExpression 5.2, which

points to a newly created group 7). Finally, each new groupExpression

triggers follow-up tasks, which depend on its type. If it is a logical op-

erator, the optimizer was exploring the group and thus an ExploreExpr

task is scheduled for the new groupExpression . Otherwise, the expres-

sion inputs are optimized, and the cost of the physical plan is calculated

(see OptInputs task).

OptInputs: This task optimizes the inputs of a given physical operator p

and computes the best execution plan rooted at p . For each child p i of p ,

it first calculates the required properties of p i with respect to p and then

schedules an OptimizeGroup task for the group of p i . As an example,

suppose that the root of the tree is a Merge Join operator. Since Merge

Join expects the inputs ordered by the join column, the current task

generates a required sort property for each of the inputs and optimizes

the corresponding groups under this new optimization context. The Opt-

Inputs task also implements a cost-based pruning strategy. Whenever it

detects that the lower bound of the expression that is being optimized is

larger than the cost of an existing solution, it fails and returns no plan

for that optimization goal.

As we can see, the five optimization tasks are nontrivial and depend on

each other. For clarity of exposition, we now present a conceptually simplified

version of the OptimizeGroup task that incorporates portions of the remain-

ing tasks that are relevant in our context. Figure 2.9 shows the algorithm for

OptimizeGroup , which takes as inputs a group G , required properties RP , and

a cost upper-bound UB . OptimizeGroup ( G,RP,UB ) returns the most ecient

physical groupExpression that satisfies RP and is under UB in cost (otherwise,

it returns NULL ). Initially, line 1 checks the winner circle for a previous call

compatible with RP . If we find such plan, we return it (if its cost is below UB ),

or we return NULL otherwise. Otherwise, line 4 iterates over all enumerated

physical groupExpressions in G (note that line 4 encapsulates ExploreGroup ,

ExploreExpr , and ApplyRule ). For each such root groupExpression p , line 5 es-

timates the local cost of candP (i.e., without counting its inputs' costs). Then,

line 8 calculates the input group and required properties for each of candP 's

inputs, and line 9 recursively calls OptimizeGroup to optimize them (note

that the upper bound in the recursive call is decreased to UB - candP.cost ).

After each input is successfully optimized, lines 11 and 12 store the best im-

plementation for each of candP 's children and update its partial cost. If at

any moment the current cost of candP becomes larger than UB , the candidate

is discarded and the next one is considered (line 7). Otherwise, after candP is

completely optimized, line 13 checks whether candP is the best plan found so

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home