Characterizing the Search Space
The physical design problem introduced in Chapter 3 can be seen as a con-
strained optimization problem, in which the optimizing function is the overall
cost of the workload (counting both queries and updates) and the constraint
is the storage bound on the configuration size. In this chapter, we characterize
the search space for the physical design problem. Specifically, we explain how
to identify a set of indexes for a given workload and storage bound that in-
cludes the desired optimal index configuration. We do this in two steps. First,
in Section 4.1 we present approaches that address a simpler problem. Specifi-
cally, we explain how to obtain a small set of indexes that include the optimal
configuration for the case of a single SELECT query and no storage constraint.
Then, in Section 4.2 we extend this basic approach to general workloads. We
finish this chapter with a formal characterization of the search space for the
physical design problem.
4.1 Candidate Indexes for a Single SELECT Query
When the workload consists of a single SELECT query and there is no storage
bound, the physical design problem becomes much simpler. Adding indexes to
a configuration does not hurt performance (because indexes are not updated)
and does not invalidate the configuration (because there is no storage bound).
We now discuss approaches to find a small set of candidate indexes that include
the optimal configuration for a single SELECT query.
4.1.1 Parsing-Based Approach
The simplest way to obtain the candidate set of indexes for a given query is by
performing a syntactic analysis of the query string. We first parse the input
query and identify the set of indexable columns in the parse tree. A column c
is indexable if there is a predicate of the form “ c op expression ”inthe WHERE
clause of the query, if c belongs to the set of columns in a group-by or order-by