Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

Now consider the following query, which is very similar to Q2 but has a

slightly more complex predicate in the WHERE clause:

Q3 = SELECT B, C, D

FROM R

WHERE 50<B<100

AND61<2*D+1<81

On the surface, Q2 and Q3 are not very different. We can rewrite the pred-

icate on D to 30<D<40 and then, applying the same argument as in the

previous example, choose between I 3 =

) based

on statistical information on columns B and D . However, we are making the

additional assumption that the optimizer would recognize the algebraic ma-

nipulation that equates 61<2*D+1<81 and 30<D<40 . If the op-

timizer is unable to derive and leverage this identity, it would not be able to

use index I 4 to seek appropriate tuples (even when we know it is the best al-

ternative). In this scenario, if only I 4 is present, the optimizer would perform

a clustered index scan to answer the query, which in general would be worse

than the index-based alternative leveraging index I 3 .

To make the running example a bit more complex, suppose that our work-

load consists of both Q2 and Q3 . Then, the best configuration would be the

union of the best index for each query (since the workload does not have

updates, there is no performance penalty in adding indexes). Suppose that,

based on statistical information and knowledge about the optimizer's sim-

plification logic, we determine that the best configuration is I 1 =

(

) and I 4 =

(

)

(

)

and I 4

. Assume now that the storage required by both I 1 and

I 4 is unacceptably high (e.g., it might not fit in the available storage). In

this case, the situation is considerably more dicult. We can always decide

to drop either I 1 or I 4 (specifically, the one that results in the smaller im-

provement) and obtain a configuration that, at least, benefits one of the two

queries. However, consider an index like I 5 =

) , which is smaller

in size than I 1 and I 4 combined and thus might fit in the available storage.

This index is slightly worse than I 1 =

(

) for Q2 because of the addi-

tional included column D , which makes it wider and therefore more expensive

to scan. While we cannot seek I 5 for Q3 because its key column A does not

occur in the WHERE clause of Q3 , I 3 can still be scanned much more eciently

than the clustered index (it contains fewer columns). The net effect is that,

although I 5 is suboptimal for both Q2 and Q3 , it might still be better than

having a configuration with a single index that is optimal for either Q2 or

Q3 . Of course, I 6

(

should also be considered, since it helps

Q3 almost as much as the optimal I 4 and is better for Q2 than the plan us-

ing a clustered index. To make matters worse, other interesting alternatives

are possible as well. Consider the scenario where the available storage is even

more constrained and cannot accommodate index I 5 . In this situation, indexes

(

)

, which combined are smaller than I 5 , can be useful if I 5 itself

is too big. Depending on the number of tuples in the result of Q2 and Q3 , using

(

)

and R

(

)

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home