Database Reference
In-Depth Information
R and
R R . For each pair of table subsets, line 6 obtains the best plan
that joins tables in
R R . For that purpose, the proce-
dure bestJoinAlternative tries different join implementations (e.g., hash-,
merge-, and index-based joins). Note that the input to bestJoinAlternative
is bestPlan(
R and tables in
R R ) , therefore leveraging the principle of
optimality. Lines 7 and 8 update bestPlan for the current subset R based on
the cost model (note that suboptimal plans are discarded and never consid-
ered again). Finally, line 9 returns the best plan for the whole set of tables,
which corresponds to the optimal join reordering for the input query. We
complement the description of the dynamic programming algorithm for join
reordering with a couple of important refinements.
First, consider a query that represents the join among
R ) and bestPlan(
{
R 1 ,
R 2 ,
R 3 }
with
the predicates R 1 .
a
=
R 2 .
a
=
R 3 .
a . Let us also assume that the cost of a
hash-based join plan for R 1
R 2 is smaller than that of a merge-based join
alternative. In such a case, bestPlan( {
R 2 } ) would keep only the hash-
based join alternative discarding the suboptimal merge-based join. However,
note that if the merge-based join alternative is used in R 1
R 1 ,
R 2 , the result of
the join is sorted on columns R 1 .
a . The sorted order may significantly
reduce the cost of the subsequent join with R 3 . Thus, pruning the merge join
alternative for R 1
a and R 2 .
R 2 can result in suboptimality of the global plan. The
problem arises because the result of the merge join between R 1 and R 2 has
an ordering of tuples in the output stream that is useful in a subsequent
join. However, the hash-based join alternative, while cheaper locally, does
not have such ordering. To address this problem, the dynamic programming
algorithm is extended to keep in bestPlan an optimal plan for every choice
of an “interesting” sort order. Two plans are then compared if they represent
the same expression and have the same interesting order.
Second, note that the algorithm in Figure 2.6 enumerates all bushy plans
(including those that perform early cartesian products). The algorithm can
easily be adapted to enumerate through different search spaces by modifying
line 5. For instance, if we want to consider only (left) linear join trees, line 5
should be changed to:
R R such that | R |=1
foreach
5
Cartesian products can be avoided until all joins have been processed by fur-
ther modifying line 5. Of course, the complexity of the algorithm and quality
of resulting plans strongly depend on the search space that is enumerated.
2.3.2 Starburst
The Starburst project at IBM Almaden Research Center built an extensible
relational DBMS. Query optimization in Starburst begins with a structural
representation of the SQL query that is used throughout the optimization
process. This representation is called the Query Graph Model (QGM). In the
Search WWH ::




Custom Search