The analogy does not stop here, as we can describe and analyze the physical
design problem in terms of the same three components that we used for the
query optimization problem: the search space of index configurations, the cost
model to evaluate configurations, and the enumeration strategy to traverse the
search space. In the next section we briefly discuss these components in the
context of the physical design problem and devote the next three chapters
to an in-depth discussion and analysis of these concepts. At the end of this
chapter we provide pointers to some earlier pieces of work in the context of
the physical design problem that might be interesting to readers studying the
evolution of ideas in the field.
3.2.1 Search Space
In the context of the physical design problem, the search space is the set of in-
dexes that could be considered candidates for a given database and workload.
We now discuss some choices that addressed this issue over the years. Some
decisions are purely pragmatic, while others depend on specific infrastructure
provided by the DBMS or its optimizer.
The first characterization of the search space that we discuss is very general
and independent of the actual workload. Initial work in the physical design
problem considered only single-column indexes. This choice was driven not
only by simplicity but also by the fact that most workloads were generated by
simple online transaction processing (OLTP) applications. In these scenarios,
queries are rather simple (mostly navigational queries, where a single tuple is
selected and possibly joined with other tables to obtain additional informa-
tion), and therefore single-column indexes are a good alternative. Advances in
query processing, coupled with ever-increasing application complexity, made
the single-column index approach inadequate. Current decision support appli-
cations often generate complex queries that require elaborate access path se-
lection and for which single-column indexes often fail to improve performance.
To handle these new scenarios, newer solutions expand the search space to in-
clude indexes that have at most a certain number of columns, while others
are unrestricted in that respect. Recent work in the physical design problem
considers multicolumn indexes in the search space.
Another characterization of the search space that is independent of the
workload concerns the search space of indexes for a single query. Driven by
simpler applications and query engines, early designs assumed that any query
would benefit from at most one index per table (and therefore excluded more
complex strategies such as index intersections). As with the previous charac-
terization, this restriction was lifted in more recent work.
After deciding on a generic characterization of the search space of indexes,
we still need to decide what the specific search space would be for a given
workload. After all, we would not want to include in the search space indexes
that are guaranteed to be useless (i.e., indexes over tables that are never men-
tioned in the workload). The traditional way to define a search space for a
given workload consists of two steps. First, there is a procedure to define the