New Challenges in Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

12.4.3 Evaluation Metrics

The metric used to evaluate a given physical design tool is a crucial component

of a benchmark. Usually, the existing literature uses a single number to mea-

sure the quality of recommendations, called percentage improvement . Given a

baseline configuration C 0 , the percentage improvement of a recommended con-

figuration C R for a workload W is given by 100 · ( 1 −

)) .

Improvement values can be negative (when the recommended configuration is

less ecient than the initial one due to stricter space constraints), but always

are smaller than 100%. In general, however, a single value like the percent-

age improvement might not provide enough detail to thoroughly evaluate and

compare physical design tuners, and more detailed metrics might be necessary.

We next discuss a metric M that shows more detailed information about the

quality of physical design solutions. Consider a workload W over a database

D , and suppose that a tuner recommends configuration C for W . The quality

of C using

cost

(

C 0 ,

cost

(

C R ,

M C , W returns, for an input time t , the number of queries in W

that executed faster than t :

) = |{

∈

W : cost

(

) ≤

M C , W (

where cost

is the actual execution time of query q under configuration C .

For pragmatic purposes, sometimes a time-out T max is chosen and cost

(

)

(

)

is capped by T max . Therefore, it is always the case that

M C , W (

T max ) =

metric for three

different configurations and a 22-query TPC-H workload. We can see in the

figure that 90% of the queries ran in less than 500 seconds under C 1, where

only 30 and 22% of the queries did the same under either C 2or C 3, respec-

tively. The M metric can be used to compare multiple tuners simultaneously

and makes possible some amount of goal-oriented evaluation (e.g., 30% of the

queries should execute in subsecond time).

A drawback of the M metric is that it does not report per-query compar-

isons because the individual queries are sorted in different orders. It is not

possible, just by looking at Figure 12.5a, to draw conclusions about the per-

formance of specific queries. For instance, although some queries were better

under C 2 than under C 1, Figure 12.5a does not show this fact. A comple-

mentary metric, called

Figure 12.5a shows a graphical representation of the

, focuses on query-by-query performance. Consider

configurations C 1 and C 2 produced by two tuning tools. We then compute,

for each query q i in the workload, the value

v i = cost

(

q i ,

C 1 ) −

cost

(

q i ,

C 2 )

Clearly, positive

v i values correspond to queries that were better under C 1

than under C 2 , and negative

v i values correspond to the opposite situation.

We then sort

metric

for the same workload described earlier. We can see that at least one query

results in better cost in C 2 than in C 1, even though C 1 looks much better

overall (as was also the case in Figure 12.5a). Although the

v i values and plot the results. Figure 12.5b shows the

metric gives ad-

ditional information on a per-query basis, it cannot be used to compare more

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home