Enumerating the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

C the most. If no such configuration is found in line 10, the best configuration

found so far is returned in line 12. Otherwise, the improved configuration newC

takes the place of C , and the greedy strategy repeats.

Note that if m =0, greedy(m,B) results in a pure greedy approach. On the

other hand, when m =

, greedy(m,B) degenerates into the exhaustive ap-

proach. Therefore, the algorithm is computationally ecient only if m is small

relative to | ES | . In such a case, the enumeration strategy exhibits “near greedy

behavior.” The reason that greedy(m,B) generally performs well is that in

many cases the largest cost reductions often result from indexes that are good

candidate indexes by themselves. At the same time, however, it is important

to capture significant index interactions . These correspond to situations in

which a single index is not useful unless another index is also present in the

configuration (e.g., consider the case of a merge-based join simultaneously

using two covering indexes). Such index interactions justify the exhaustive

phase of greedy(m,B) , which helps capture interactions that have the most

significant effect on cost.

|

ES

6.1.1.1

Variants

A variant of the previously given enumeration algorithm additionally relies

on branch-and-bound. The idea is to use greedy(m,B) with a low value of

m to quickly generate a first-cut solution. Subsequently, configurations are

enumerated with greedy(m', k) for some m >

m . While we are calculating the

cost of the workload under a specific configuration, we can stop (and discard

the configuration) as soon as the cost of any prefix of the workload exceeds

the total cost of the first-cut solution. This approach executes greedy(m,B)

twice, but it might save significant what-if calls for undesirable configurations

during the second execution.

Alternatively, we can obtain several seed configurations (either by keeping

the top configurations found in lines 1-5 of Figure 6.3 or as result of using

other heuristics) and apply the greedy phase in lines 6-11 over each seed. Such

alternatives help prevent local minima and in general traverse a larger search

space with only a modest overhead.

We finally note that the initial formulation of greedy(m,B) was used to

enumerate multicolumn indexes after obtaining an enumeration space with

only single-column indexes. The idea is to call greedy(m,B) with just single-

column indexes in the enumeration space and to obtain back a configuration

C 1 . Then, we add to the enumeration space all two-column indexes that can

be extended from an index in C 1 and call greedy(m,B) again with this new set,

obtaining back C 2 . We repeat this procedure until indexes of a certain length

have been considered and return the best configuration overall. Since new

candidate selection mechanisms do consider multicolumn indexes, however,

this procedure is not widely used in practice anymore.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home