Practical Aspects in Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

Some workloads consist of a mixture of high-priority queries (which should

be evaluated as eciently as possible) and low-priority queries (which are not

critical to the business and can tolerate some amount of delay). After gathering

a representative workload, we can handle such scenarios by assigning weights

to input queries. A common approach is to replace each query q by a pair

(

,w) , where w is a number that represents the weight or importance of q .

Then, the cost of the workload W under a given configuration C is extended

to be i w i ·

q

cost

(

q i ,

C

) , and the search strategy stays unchanged.

7.2 Workload Compression

After a representative workload has been gathered (especially under the pro-

filing approach described previously), it is usually a good idea to postprocess

it before passing it to a tuning tool [see Figure 7.1(2)]. On one hand, the

workload might have many queries (or in general, profiled events ) that are

not relevant to a specific physical design problem. Examples are queries that

are defined over tables that do not require tuning or those that do not access

database tables directly but instead set database parameters or are invoked

from utilities such as backup/restore. If the query profiler did not filter out

such events (as described in the previous section), a postprocessing filtering

step is required.

Even after discarding such nonrelevant queries, the resulting workload can

sometimes be very large. This issue affects the scalability of physical design

tools, since we would need to evaluate the cost of a large number of queries

for each candidate configuration. In such cases, a natural question is whether

tuning a smaller subset of the workload would be sucient to give a recommen-

dation with approximately the same reduction in cost as the recommendation

that would have been obtained by tuning the entire workload.

A simple strategy to handle very large workloads is to eliminate duplicate

queries. Specifically, we keep a single instance of each distinct query in the

workload and assign it a weight that is equal to the number of times that

the query appears in the original workload. In general, however, queries in

the workload are parameterized and invoked with specific values via stored

procedures. In such cases, very few instances would be truly duplicates due to

varying parameter values, and this technique would not be effective. In such

cases, it is useful to use relax the duplicate elimination strategy. The idea is

to leverage the inherent parametrization in query workloads by partitioning

the queries in equivalence classes based on their “signatures.” Concretely, two

queries have the same signature and therefore belong to the same equivalence

class, if they are identical in all respects except for their constant values. In

many cases, it is not enough to pick a single representative query from each

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home