Enumerating the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

value of k allows us to obtain the smallest k by performing a binary search

on k (solving a linear program at each iteration) and thus to calculate our

desired benefit assignment. Consider two indexes I 1 and I 2 , and suppose that

for a given query,

β( {

I 1 } )

β( {

I 2 } )

β( {

I 1 ,

I 2 } )

=20,

=10, and

=25. The original

formulation for benefit values would result in b

25 (or 12.5 if we

divide the whole benefit equally per index in the query). Using the approach

described already, we would obtain b

(

I 1 ) =

b

(

I 2 ) =

I 2 ) = 8 . 33, which

minimizes the deviation from the true benefit as little as possible.

(

I 1 ) =

16 . 67 and b

(

6.1.2.2

Extending the Enumeration Strategy

Even when using more accurate benefit values, these are still an approxima-

tion. If we base our search purely on such benefit values, we might miss con-

figurations that have a combined benefit value that is underestimated when

we add individual index benefits. It makes sense then to explore some addi-

tional points in the search space beyond what is traversed by the heuristic

solution to the 0-1 knapsack problem. For that purpose, a common approach

is to randomize the initial solution produced by the knapsack heuristic and

thus to explore nearby points in the search space. Specifically, we swap a small

number

of indexes in the current configuration with indexes not in the con-

figuration and thus obtain a configuration that is in the “neighborhood” of

the original one. We repeat this procedure and improve the current solution

until a prespecified time bound is reached returning the best configuration

overall.

Figure 6.4 illustrates the knapsack-based approach discussed in this sec-

tion. Lines 1-3 calculate the weight and benefit values for each index in the

enumeration space using any of the techniques discussed already. Lines 4 and

5 implement the heuristic approach to knapsack, and lines 6-8 implement the

randomized postprocessing of the initial solution.

δ

knapsack (ES: set of indexes, B:size constraint)

1

foreach I ∈ ES

2

w I = size of I

// See Section 6.1.2.1

3

b I = benefit of I for workload W

4

sort ES by decreasing b I /w I

5

C = longest prefix in ES that fits B

while (time does not expire)

6

C'=C-C −

∪ C +

// C −

⊆ C, C +

⊆ ES-C, | C −

|≤ δ , | C + |≤ δ

7

if (cost(W,C') < cost(W,C))C=C'

8

return C

9

FIGURE 6.4

Knapsack-based approach to traverse the enumeration space.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home