Characterizing the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

clause, or if c is updated in an UPDATE statement. For instance, consider the

query following:

SELECT R.a, R.b

FROM R, S

WHERE R.x = S.y

ORDER BY R.a

The indexable columns in this case are R.a (because it belongs to the order-by

columns) and both R.x and S.y (because they are part of a valid predicate in

the WHERE clause).

Additionally, we need to identify the columns that are not indexable but still

mentioned in the query string and therefore required during query processing.

In the previous query, column R.b is not indexable but is required for the

proper evaluation of the query.

Once all relevant columns have been identified, we can obtain the set of

candidates for the query. There are different ways to define this initial can-

didate set, depending on the capabilities of the query engine. For instance,

suppose that the query engine can leverage only single-column indexes. In

that case, the candidate set for the query consists of one clustered index and

one nonclustered index for each indexable column. The query would result in

one clustered and one nonclustered index for columns R.a , R.x , and S.y , for

a total of six candidate indexes.

Modern query engines, however, can handle multicolumn indexes. Suppose

that I T is the set of indexable columns of table T , and R T is the set of non-

indexable columns of table T for the input query. The candidate indexes for

table T are all of the form T

(

K

|

R 1 ∪

R 2 )

, where K is a permutation of I

⊆

I T ,

R T } . That is, for each subset of the indexable

columns I T , we consider all its permutations as key columns, add subsets of

the remaining indexable columns as included columns, and optionally add the

nonindexable columns as additional included columns. For the previous query,

the candidate set is { R(x), R(x|b), R(x|a), R(x|a,b), R(a), R(a|b),

R(a|x), R(a|b,x), R(a,x), R(a,x|b), R(x,a), R(x,a|b), S(y) } . Ad-

ditionally, the candidate set contains one clustered index for each nonclustered

index in the previous set, with the same key columns.

Clearly the number of such indexes can be very large even for queries of

moderate complexity. We can obtain a smaller candidate set that is very likely

to contain the optimal configuration for the input query if we (1) consider

covering only indexes (i.e., indexes that contain all columns required in the

query), (2) restrict permutations of key columns to those that begin with all

equality predicates in the WHERE clause or with all columns in the group-by

or order-by columns, and (3) allow permutations only in which the remaining

key columns are in decreasing order of selectivity. Using these heuristics, the

candidate set for the previous query is

R 1

⊆ (

I T

−

I

) , and R 2

∈{∅ ,

{

}

plus the

R(a,x|b), R(x,a|b), S(y)

corresponding clustered indexes.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home