Database Reference
In-Depth Information
6.1.2 Knapsack-Based Approach
The enumeration strategy for the physical design problem can also be seen as
an application of the classical knapsack combinatorial optimization problem.
Suppose we are given a set of items, each with a weight and a benefit value.
The knapsack problem consists of determining how many instances of each
item to put in a knapsack of capacity B so that the combined weight of all
instances is below B and the overall benefit is maximized. There are several
variations of the knapsack problem, and in this section we consider the 0-1
knapsack problem, in which we can pick each item at most once.
To model the enumeration strategy as a knapsack problem, we consider
each index in the enumeration space as an item. We then need to determine
appropriate values for the weight and benefit of each index. The weight of an
index is its estimated size, so the combined weight of a configuration is the
overall storage required to materialize it. Determining the benefit of an index
is more complex. A simple strategy to determine the benefit of an index is
as follows. Consider the configuration that has all indexes in the enumeration
space, and call it C ES (we pick only one clustering index per table from the
enumeration space so that the resulting configuration is valid). Also consider
the configuration with no indexes (or alternatively, the current configuration)
and call it C 0 . The benefit of an index I can then be obtained by finding all
queries that use I under C ES and adding their benefits in terms of execution
cost with respect to configuration C 0 . That is, the benefit for an index I ,
denoted as b
(
I
)
, is defined as follows:
cost
)
b
(
I
) =
(
q S ,
C 0 )
cost
(
q S ,
C ES )
updateCost
(
q
,
I
q W that uses I under C ES
where q S is the inner select query of an UPDATE query q (or q itself if q is a
SELECT query), and updateCost(
) is the estimated cost of updating index
I in the update shell of an UPDATE q (or zero if q is a SELECT query). That
is, the benefit of an index I is roughly measured by how much I can decrease
the cost of the workload in the best case.
Once we define weight and benefit values in this way, we can use well-known
approaches to solve the resulting knapsack instance. The knapsack problem,
however, is NP-hard, and therefore finding the optimal solution is unfeasible
in practice. A heuristic to solve the 0-1 knapsack problem is to sort indexes in
order of decreasing ratio of benefit to weight and to pick indexes in this order
until the available space is exhausted. This assignment performs very well in
practice, and a very simple refinement guarantees a factor 2 approximation
to the optimal solution.
There are, however, a few complications in this straightforward applica-
tion of the knapsack problem. First, the technique to assign benefits to index
would not produce accurate results for merged indexes, which would not be
q
,
I
Search WWH ::




Custom Search