Database Reference
In-Depth Information
picked by the optimizer to evaluate any query in the workload whenever the
original candidate indexes are also present. Additionally, indexes usually in-
teract with each other, and therefore assigning a fixed benefit value to each
index will miss opportunities (see our discussion in the previous section in the
context of greedy(m,B) ). Finally, the simple heuristic to solve the knapsack
problem is ecient but can be twice as bad as the optimal solution, and we
might want to trade additional optimization time with the quality of the re-
sulting configuration. We next comment on some approaches to address these
limitations.
6.1.2.1
Improving Benefit Values
Rather than assigning the full benefit of a query q to all indexes used to
evaluate it under C ES , we can instead scale down this benefit value by the
number of indexes used in q 's plan. Additionally, we can assign a fraction of
the benefit of an index to all the derived candidates via merging, reduction, or
other operations. In this way, the benefit of an index more accurately describes
the reality during query optimization.
Even with these adjustments, we have the problem that index benefits are
not independent of one another. Therefore, any strategy that assigns benefits
to individual indexes and ignores interaction among them will incur approxi-
mation errors. That is, when the individual benefits of indexes are added up,
they will either underestimate or overestimate the actual benefit of such set of
indexes for the query. To alleviate this problem, we could try to assign benefits
to indexes in a more principled manner such that the previous approximation
error is the least. That is, suppose that
β(
)
C
is the true benefit of configura-
) = q W (
tion C for workload W , that is,
β(
C
cost
(
W
,
C 0 )
cost
(
W
,
C
))
.We
would like to obtain benefit values b
for indexes I in the enumeration space
such that for each configuration C we have that β(
(
I
)
) = I C b
C
(
I
) . Since the
overall benefit function β(
C
) can be arbitrary, it may not be possible to get
such an assignment for b
) values. The aim is to obtain an assignment that
minimizes the error between β(
(
I
) and I C b
C
(
I
) . Specifically, we would like
to obtain an assignment of b
(
I
) values that results in
C ES : β(
C
)
C
b
(
I
)
k
· β(
C
)
k
I C
for the smallest possible value k . This problem can be decomposed into finding
b
values for each query q in the workload and then adding up all the partial
benefits. For a given query and fixed value of k it is possible to obtain such
assignment (if it exists) by solving a linear program with the inequalities
above (a pair of equations for each C
(
I
)
C ES ). Note that since we solve a
linear program for each query, and indexes that are not relevant for the query
have zero benefit, the number of valid subsets C
C ES is not that big (we
need to consider subsets that contain candidate indexes only for the query in
consideration). Having a subroutine that solves the linear program for a fixed
Search WWH ::




Custom Search