Enumerating the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

picked by the optimizer to evaluate any query in the workload whenever the

original candidate indexes are also present. Additionally, indexes usually in-

teract with each other, and therefore assigning a fixed benefit value to each

index will miss opportunities (see our discussion in the previous section in the

context of greedy(m,B) ). Finally, the simple heuristic to solve the knapsack

problem is ecient but can be twice as bad as the optimal solution, and we

might want to trade additional optimization time with the quality of the re-

sulting configuration. We next comment on some approaches to address these

limitations.

6.1.2.1

Improving Benefit Values

Rather than assigning the full benefit of a query q to all indexes used to

evaluate it under C ES , we can instead scale down this benefit value by the

number of indexes used in q 's plan. Additionally, we can assign a fraction of

the benefit of an index to all the derived candidates via merging, reduction, or

other operations. In this way, the benefit of an index more accurately describes

the reality during query optimization.

Even with these adjustments, we have the problem that index benefits are

not independent of one another. Therefore, any strategy that assigns benefits

to individual indexes and ignores interaction among them will incur approxi-

mation errors. That is, when the individual benefits of indexes are added up,

they will either underestimate or overestimate the actual benefit of such set of

indexes for the query. To alleviate this problem, we could try to assign benefits

to indexes in a more principled manner such that the previous approximation

error is the least. That is, suppose that

β(

)

is the true benefit of configura-

) = q ∈ W (

tion C for workload W , that is,

β(

cost

(

C 0 ) −

cost

(

))

.We

would like to obtain benefit values b

for indexes I in the enumeration space

such that for each configuration C we have that β(

(

)

) = I ∈ C b

(

) . Since the

overall benefit function β(

) can be arbitrary, it may not be possible to get

such an assignment for b

) values. The aim is to obtain an assignment that

minimizes the error between β(

(

) and I ∈ C b

(

) . Specifically, we would like

to obtain an assignment of b

(

) values that results in

C ES : β(

)

∀

⊆

≤

(

) ≤

· β(

)

I ∈ C

for the smallest possible value k . This problem can be decomposed into finding

values for each query q in the workload and then adding up all the partial

benefits. For a given query and fixed value of k it is possible to obtain such

assignment (if it exists) by solving a linear program with the inequalities

above (a pair of equations for each C

(

)

C ES ). Note that since we solve a

linear program for each query, and indexes that are not relevant for the query

have zero benefit, the number of valid subsets C

⊆

C ES is not that big (we

need to consider subsets that contain candidate indexes only for the query in

consideration). Having a subroutine that solves the linear program for a fixed

⊆

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home