Characterizing the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

associated with equality predicates, scanning such index still satisfies the

order in O ). The included columns in this case are obtained as in the

previous scenarios. We finally compare the cost of the two alternatives

(i.e., the one derived from scenario 3 with an additional sort operator

and the one that natively returns tuples in the right order) and return

the one with the minimal expected cost.

Consider an index request over table T with E

={ (

a

, 100 ) } , R

={ (

b

, 10 ) } ,

P

=∅ , O

={

c

} , and A

={

d

} . The best index ignoring the sort column is

) . This index does not return tuples in the right order, so the

resulting plan would have a sort operator on top. The alternative index that

returns tuples in the right order is I 2 =

I 1 =

T

(

a

,

b

|

c

,

d

) . The best index is the one

that results in the cheaper execution plan for the request.

T

(

a

,

c

|

b

,

d

4.1.4 Reducing the Candidate Set

The approaches discussed earlier obtain a candidate set of indexes that likely

contains the optimal configuration. Although the resulting candidate set is

usually small, there might be opportunities to further reduce its size. Specifi-

cally, indexes in the candidate set can be helpful for the query but nevertheless

conflicting among each other. For instance, an index on table S that is part

of an index-based join between an outer table R and S conflicts with another

index that seeks a single-table predicate in S and performs a hash join with

R (because both indexes cannot be used in the same execution plan).

A common mechanism to reduce the set of indexes is to rely on the query

optimizer itself. Suppose that we create all candidate indexes and optimize

the query. The optimizer can now leverage the good properties of each such

index and obtain the overall optimal plan. In such a case, we can inspect this

plan and identify which of the candidate indexes are used and can refine our

candidate set so that it contains only these. A well-behaved optimizer, faced

with this subset of indexes, would result in the same optimal plan.

Some clarifications are needed to further understand this idea. First, it

seems that we need to create a large set of indexes just to optimize a query

and obtain a reduced set of candidate indexes. Creating indexes is very expen-

sive and thus can be seen as a crucial drawback of this approach. However, as

we explain in Chapter 5, we can simulate the optimization of the input query

without actually creating the indexes. Second, there is a problem if the initial

candidate set contains multiple clustered indexes for the same table, because

the resulting configuration is invalid and cannot be materialized. In general,

to guarantee optimality, we need to perform the optimization step for each

combination of clustered indexes. Some approaches avoid this time-consuming

step by heuristically choosing a small set of alternatives (e.g., coordinate clus-

tered indexes in different tables so that join predicates can be answered di-

rectly) and performing a few optimization calls, retaining the indexes of the

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home