Database Reference
In-Depth Information
be more than one). Third, the resulting expressions obtained by apply-
ing the rules are integrated back into the
MEMO
, possibly creating new,
unexplored
groups
(as an example, applying the join associativity rule
to expression 5.1 in Figure 2.7b results in
groupExpression
5.2, which
points to a newly created
group
7). Finally, each new
groupExpression
triggers follow-up tasks, which depend on its type. If it is a logical op-
erator, the optimizer was exploring the
group
and thus an
ExploreExpr
task is scheduled for the new
groupExpression
. Otherwise, the expres-
sion inputs are optimized, and the cost of the physical plan is calculated
(see
OptInputs
task).
OptInputs:
This task optimizes the inputs of a given physical operator
p
and computes the best execution plan rooted at
p
. For each child
p
i
of
p
,
it first calculates the required properties of
p
i
with respect to
p
and then
schedules an
OptimizeGroup
task for the
group
of
p
i
. As an example,
suppose that the root of the tree is a
Merge Join
operator. Since
Merge
Join
expects the inputs ordered by the join column, the current task
generates a required sort property for each of the inputs and optimizes
the corresponding
groups
under this new optimization context. The
Opt-
Inputs
task also implements a cost-based pruning strategy. Whenever it
detects that the lower bound of the expression that is being optimized is
larger than the cost of an existing solution, it fails and returns no plan
for that optimization goal.
As we can see, the five optimization tasks are nontrivial and depend on
each other. For clarity of exposition, we now present a conceptually simplified
version of the
OptimizeGroup
task that incorporates portions of the remain-
ing tasks that are relevant in our context. Figure 2.9 shows the algorithm for
OptimizeGroup
, which takes as inputs a
group
G
, required properties
RP
, and
a cost upper-bound
UB
.
OptimizeGroup
(
G,RP,UB
) returns the most ecient
physical
groupExpression
that satisfies
RP
and is under
UB
in cost (otherwise,
it returns
NULL
). Initially, line 1 checks the
winner circle
for a previous call
compatible with
RP
. If we find such plan, we return it (if its cost is below
UB
),
or we return
NULL
otherwise. Otherwise, line 4 iterates over all enumerated
physical
groupExpressions
in
G
(note that line 4 encapsulates
ExploreGroup
,
ExploreExpr
, and
ApplyRule
). For each such root
groupExpression
p
, line 5 es-
timates the local cost of
candP
(i.e., without counting its inputs' costs). Then,
line 8 calculates the input
group
and required properties for each of
candP
's
inputs, and line 9 recursively calls
OptimizeGroup
to optimize them (note
that the upper bound in the recursive call is decreased to
UB - candP.cost
).
After each input is successfully optimized, lines 11 and 12 store the best im-
plementation for each of
candP
's children and update its partial cost. If at
any moment the current cost of
candP
becomes larger than
UB
, the candidate
is discarded and the next one is considered (line 7). Otherwise, after
candP
is
completely optimized, line 13 checks whether
candP
is the best plan found so
Search WWH ::
Custom Search