Database Reference
In-Depth Information
C the most. If no such configuration is found in line 10, the best configuration
found so far is returned in line 12. Otherwise, the improved configuration newC
takes the place of C , and the greedy strategy repeats.
Note that if m =0, greedy(m,B) results in a pure greedy approach. On the
other hand, when m =
, greedy(m,B) degenerates into the exhaustive ap-
proach. Therefore, the algorithm is computationally ecient only if m is small
relative to | ES | . In such a case, the enumeration strategy exhibits “near greedy
behavior.” The reason that greedy(m,B) generally performs well is that in
many cases the largest cost reductions often result from indexes that are good
candidate indexes by themselves. At the same time, however, it is important
to capture significant index interactions . These correspond to situations in
which a single index is not useful unless another index is also present in the
configuration (e.g., consider the case of a merge-based join simultaneously
using two covering indexes). Such index interactions justify the exhaustive
phase of greedy(m,B) , which helps capture interactions that have the most
significant effect on cost.
|
|
ES
6.1.1.1
Variants
A variant of the previously given enumeration algorithm additionally relies
on branch-and-bound. The idea is to use greedy(m,B) with a low value of
m to quickly generate a first-cut solution. Subsequently, configurations are
enumerated with greedy(m', k) for some m >
m . While we are calculating the
cost of the workload under a specific configuration, we can stop (and discard
the configuration) as soon as the cost of any prefix of the workload exceeds
the total cost of the first-cut solution. This approach executes greedy(m,B)
twice, but it might save significant what-if calls for undesirable configurations
during the second execution.
Alternatively, we can obtain several seed configurations (either by keeping
the top configurations found in lines 1-5 of Figure 6.3 or as result of using
other heuristics) and apply the greedy phase in lines 6-11 over each seed. Such
alternatives help prevent local minima and in general traverse a larger search
space with only a modest overhead.
We finally note that the initial formulation of greedy(m,B) was used to
enumerate multicolumn indexes after obtaining an enumeration space with
only single-column indexes. The idea is to call greedy(m,B) with just single-
column indexes in the enumeration space and to obtain back a configuration
C 1 . Then, we add to the enumeration space all two-column indexes that can
be extended from an index in C 1 and call greedy(m,B) again with this new set,
obtaining back C 2 . We repeat this procedure until indexes of a certain length
have been considered and return the best configuration overall. Since new
candidate selection mechanisms do consider multicolumn indexes, however,
this procedure is not widely used in practice anymore.
Search WWH ::




Custom Search