Database Reference
In-Depth Information
we optimize q under C 2 and obtain plan P 2 and moreover that the internal
subplan IP 2 for P 2 is different from IP 1 (see again Figure 5.7). Note that
both IP 1 and IP 2 are just different join reorderings of the same query. As
such, we can instantiate all index-based subplans from P 1 (i.e., R 1 , S 1 , and T 1
in the figure) in the internal subplan IP 2 and obtain a valid plan for q under
configuration C 1 . Since the optimizer did not return such a plan for q under
C 1 , and the cost of R 1 , S 1 , and T 1 is independent of the internal subplan they
belong to, we can conclude that the cost of IP 1 is smaller than that of IP 2 .
But then we can construct a plan that uses IP 1 and index subplans R 2 , S 2 ,
and T 2 from P 2 . This plan, called P 2 , is valid for q under configuration C 2 .
Additionally, it is more ecient than P 2 because, as we have shown before, IP 1
is more ecient than IP 2 , and R 2 , S 2 , and T 2 have the same cost in both P 2 and
P 2 . We conclude then that P 2 is not optimal for C 2 , which is a contradiction.
Therefore, every optimal plan in this restricted scenario must contain the same
internal subplan, and the upper-bound technique of the previous section would
return accurate values of cost
for arbitrary configurations.
Of course, the previous example relies on a drastically simplified query
processor. In reality, indexes on join columns can result in plans that use
merge- or index-based joins. Therefore, a better execution plan that is based
on a different join order would be missed by the algorithm in Figure 5.5. INUM
is a technique that extends the upper-bound approach of the previous section
in the following way. During a preprocessing step, several optimization calls
are issued for a given query until the resulting inner subplans are sucient
to infer execution plans for as many configurations as possible. The resulting
inner subplans constitute INUM's space. Then, INUM considers the inner
subplans in its space as the input CS in Figure 5.5 (with some additional
improvements to avoid considering each inner subplan for every call).
The set of optimization calls required in the preprocessing step of a given
query depends on the capabilities of the query engine. In the trivial (and
unrealistic) case described in Figure 5.7, a single optimization call would be
enough to obtain the internal subplan needed to infer optimal plans for ar-
bitrary configurations (using hash-based joins only). When we additionally
consider merge joins, INUM needs to optimize query q for configurations that
include indexes for each combination of interesting orders of q . Interesting
orders are derived from columns that are mentioned in a join predicate or a
group-by clause. Consider a query q referencing tables T 1 ,... ,
(
q
,
C
)
T n and let O i
be the set of interesting orders for table T i . The set O
O n con-
tains all possible combinations of interesting orders that a configuration can
cover. Making some general assumption on the optimizer cost model, it can be
shown that for every member of O there exists a single optimal inner subplan
using hash- or merge-based joins. Thus, to compute the INUM space in this
case, it is enough to optimize the query once for each member of O using
some representative configuration. Although in the worst case this technique
requires an exponential number of optimization calls in the number of query
tables, there exist heuristics to reduce the number of optimization calls.
=
O 1 × ... ×
Search WWH ::




Custom Search