Characterizing the Search Space - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

Scenario 1:

|≤

1 , P

=∅

, O

=∅

. Suppose first that the index request

contains no sort columns (i.e., O

=∅

), no columns in nonsargable pred-

=∅

icates (i.e., P

), and at most one column associated with a sar-

gable range predicate (i.e.,

1). In this case, we can show that

no strategy using RID lookups is optimal. To see this, consider an

execution plan that looks up R A rows from index I A (stored in P A

pages) and subsequently performs that many RID lookups to obtain

the remaining columns. The number of I/Os required by this plan is

C A =

R A

P R ) , where P R is the number of pages used to store

R in a clustered index or heap. Consider now index I B , which contains

all columns in I A followed by all other required columns in the query

execution plan, and denote P B ( ≥

P A

| R | + min (

R A ,

P A ) as the number of pages used to

store I B . Since I B is a covering index, we can retrieve the same result as

before by seeking I B with the same predicates we used on I A but without

doing RID lookups. The number of I/Os of such a plan is C B

R A

P B

R A

We consider two cases. If R A ≥

P R , then C A =

| R | +

P R ≥

P B ≥

C B .

R A

Otherwise, if R A <

P R , C A =

P A

| R | +

R A =

| R | (

P A +|

| ) . Then, C A <

C B

if P A +|

P B . This is not possible, however, because each row fits in

one page, and therefore |

| <

P B .

Therefore, we can guarantee that the optimal plan does not use index

intersections or RID lookups, so it must seek a covering index. To obtain

this index we proceed as follows. First, we define the key columns of the

index as the set of columns in equality-based sargable predicates (in any

order), followed by the column in the range-based sargable predicate

with the smallest selectivity. ∗ Modern query engines can seek multiple

equality predicates (and one final range predicate) using a multicolumn

index simultaneously by concatenating the predicate values in the search

key. Therefore, the key columns of the index previously described are the

ones resulting in the seek with the smallest number of false positives.

The included columns in the index are all those in R , P , O , and A

that are not already in the index key. As an example, consider an index

request over table T with E

|≥

={ (

100

) }

, R

={ (

200

) }

, P

=∅

and A

}

. In this case, the optimal index would be T

(

)

=∅

Scenario 2: P

. Suppose first that there are multiple range-

based sargable predicates in R but that P and O are still empty. In

this case, the key columns for the optimal index are obtained in the

same way as in scenario 1. However, obtaining the included columns of

the index is more complex. A covering index guarantees that no RID

lookups would be necessary. At the same time, it can be more expensive

than a strategy that relies on a narrow index that includes a few columns

, O

∗ If there are no columns in sargable predicates (i.e., E

=∅ ) , the key of the index is the

column, among those in P , O , and A , with the smallest size.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home