Database Reference
In-Depth Information
Some workloads consist of a mixture of high-priority queries (which should
be evaluated as eciently as possible) and low-priority queries (which are not
critical to the business and can tolerate some amount of delay). After gathering
a representative workload, we can handle such scenarios by assigning weights
to input queries. A common approach is to replace each query q by a pair
(
,w) , where w is a number that represents the weight or importance of q .
Then, the cost of the workload W under a given configuration C is extended
to be i w i ·
q
cost
(
q i ,
C
) , and the search strategy stays unchanged.
7.2 Workload Compression
After a representative workload has been gathered (especially under the pro-
filing approach described previously), it is usually a good idea to postprocess
it before passing it to a tuning tool [see Figure 7.1(2)]. On one hand, the
workload might have many queries (or in general, profiled events ) that are
not relevant to a specific physical design problem. Examples are queries that
are defined over tables that do not require tuning or those that do not access
database tables directly but instead set database parameters or are invoked
from utilities such as backup/restore. If the query profiler did not filter out
such events (as described in the previous section), a postprocessing filtering
step is required.
Even after discarding such nonrelevant queries, the resulting workload can
sometimes be very large. This issue affects the scalability of physical design
tools, since we would need to evaluate the cost of a large number of queries
for each candidate configuration. In such cases, a natural question is whether
tuning a smaller subset of the workload would be sucient to give a recommen-
dation with approximately the same reduction in cost as the recommendation
that would have been obtained by tuning the entire workload.
A simple strategy to handle very large workloads is to eliminate duplicate
queries. Specifically, we keep a single instance of each distinct query in the
workload and assign it a weight that is equal to the number of times that
the query appears in the original workload. In general, however, queries in
the workload are parameterized and invoked with specific values via stored
procedures. In such cases, very few instances would be truly duplicates due to
varying parameter values, and this technique would not be effective. In such
cases, it is useful to use relax the duplicate elimination strategy. The idea is
to leverage the inherent parametrization in query workloads by partitioning
the queries in equivalence classes based on their “signatures.” Concretely, two
queries have the same signature and therefore belong to the same equivalence
class, if they are identical in all respects except for their constant values. In
many cases, it is not enough to pick a single representative query from each
Search WWH ::




Custom Search