Practical Aspects in Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

equivalence class. The reason is that constants themselves might impact the

choice of physical designs. Consider, for instance, an equivalence class that

contains all queries that share the following template and have specific values

for parameters @v1 and @v2 :

SELECT R.a

FROM R

WHERE R.b < @v1 AND R.c > @v2

ORDER BY R.d

Depending on specific values of @v1 and @v2 , the best index for the result-

ing query could be R

(

)

if R.b < @v1 is very selective, R

(

)

instead R.c < @v2 is very selective, or even R

if both predicates

are not selective. If we pick only a single query from this equivalence class,

we might bias the resulting configuration toward a specific set of indexes. To

address this problem, we can pick a random subset from each equivalence

class. For instance, we can pick max

(

)

queries from equivalence

class EC (i.e., a 10% random sample of the equivalence class or five queries,

whichever is larger). In that way, each interesting combination of values in a

partition would probably be represented in the compressed workload, and the

resulting configuration would be close to the optimal one when considering

the full workload.

Note that the previous approach can be seen as an instance of a cluster-

ing problem. The idea is to create clusters of queries in the workload, where

each cluster contains queries that are similar to each other and different from

queries in other clusters, based on a given distance function. We then select

a representative subset of queries from each cluster and thus obtain the com-

pressed workload. The technique discussed previously considers a distance

function between queries defined as follows:

(

·|

| )

0 f q 1 and q 2 are identical except for constant values

distance

(

q 1 ,

q 2 ) =

otherwise

More advanced distance functions take into account domain-specific knowl-

edge but are more expensive to compute. For instance, we can define a distance

function as follows:

q 2 ) = | I(

q 1 ) ∩ I(

q 2 ) |

distance

(

q 1 ,

| I(

q 1 ) ∪ I(

q 2 ) |

where

is the set of candidate indexes for query q as defined in Section 4.1.

In this way, the more similar the candidate sets of two queries, the closer the

queries in terms of distance. This technique is more sophisticated than the

)

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home