Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

The analogy does not stop here, as we can describe and analyze the physical

design problem in terms of the same three components that we used for the

query optimization problem: the search space of index configurations, the cost

model to evaluate configurations, and the enumeration strategy to traverse the

search space. In the next section we briefly discuss these components in the

context of the physical design problem and devote the next three chapters

to an in-depth discussion and analysis of these concepts. At the end of this

chapter we provide pointers to some earlier pieces of work in the context of

the physical design problem that might be interesting to readers studying the

evolution of ideas in the field.

3.2.1 Search Space

In the context of the physical design problem, the search space is the set of in-

dexes that could be considered candidates for a given database and workload.

We now discuss some choices that addressed this issue over the years. Some

decisions are purely pragmatic, while others depend on specific infrastructure

provided by the DBMS or its optimizer.

The first characterization of the search space that we discuss is very general

and independent of the actual workload. Initial work in the physical design

problem considered only single-column indexes. This choice was driven not

only by simplicity but also by the fact that most workloads were generated by

simple online transaction processing (OLTP) applications. In these scenarios,

queries are rather simple (mostly navigational queries, where a single tuple is

selected and possibly joined with other tables to obtain additional informa-

tion), and therefore single-column indexes are a good alternative. Advances in

query processing, coupled with ever-increasing application complexity, made

the single-column index approach inadequate. Current decision support appli-

cations often generate complex queries that require elaborate access path se-

lection and for which single-column indexes often fail to improve performance.

To handle these new scenarios, newer solutions expand the search space to in-

clude indexes that have at most a certain number of columns, while others

are unrestricted in that respect. Recent work in the physical design problem

considers multicolumn indexes in the search space.

Another characterization of the search space that is independent of the

workload concerns the search space of indexes for a single query. Driven by

simpler applications and query engines, early designs assumed that any query

would benefit from at most one index per table (and therefore excluded more

complex strategies such as index intersections). As with the previous charac-

terization, this restriction was lifted in more recent work.

After deciding on a generic characterization of the search space of indexes,

we still need to decide what the specific search space would be for a given

workload. After all, we would not want to include in the search space indexes

that are guaranteed to be useless (i.e., indexes over tables that are never men-

tioned in the workload). The traditional way to define a search space for a

given workload consists of two steps. First, there is a procedure to define the

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home