Database Reference
In-Depth Information
12.4.3 Evaluation Metrics
The metric used to evaluate a given physical design tool is a crucial component
of a benchmark. Usually, the existing literature uses a single number to mea-
sure the quality of recommendations, called percentage improvement . Given a
baseline configuration C 0 , the percentage improvement of a recommended con-
figuration C R for a workload W is given by 100 · ( 1
)) .
Improvement values can be negative (when the recommended configuration is
less ecient than the initial one due to stricter space constraints), but always
are smaller than 100%. In general, however, a single value like the percent-
age improvement might not provide enough detail to thoroughly evaluate and
compare physical design tuners, and more detailed metrics might be necessary.
We next discuss a metric M that shows more detailed information about the
quality of physical design solutions. Consider a workload W over a database
D , and suppose that a tuner recommends configuration C for W . The quality
of C using
cost
(
C 0 ,
W
)/
cost
(
C R ,
W
M C , W returns, for an input time t , the number of queries in W
that executed faster than t :
) = |{
q
W : cost
(
q
,
C
)
t
}|
M C , W (
t
|
W
|
where cost
is the actual execution time of query q under configuration C .
For pragmatic purposes, sometimes a time-out T max is chosen and cost
(
q
,
C
)
(
q
,
C
)
is capped by T max . Therefore, it is always the case that
M C , W (
T max ) =
1.
metric for three
different configurations and a 22-query TPC-H workload. We can see in the
figure that 90% of the queries ran in less than 500 seconds under C 1, where
only 30 and 22% of the queries did the same under either C 2or C 3, respec-
tively. The M metric can be used to compare multiple tuners simultaneously
and makes possible some amount of goal-oriented evaluation (e.g., 30% of the
queries should execute in subsecond time).
A drawback of the M metric is that it does not report per-query compar-
isons because the individual queries are sorted in different orders. It is not
possible, just by looking at Figure 12.5a, to draw conclusions about the per-
formance of specific queries. For instance, although some queries were better
under C 2 than under C 1, Figure 12.5a does not show this fact. A comple-
mentary metric, called
Figure 12.5a shows a graphical representation of the
M
, focuses on query-by-query performance. Consider
configurations C 1 and C 2 produced by two tuning tools. We then compute,
for each query q i in the workload, the value
I
v i = cost
(
q i ,
C 1 )
cost
(
q i ,
C 2 )
.
Clearly, positive
v i values correspond to queries that were better under C 1
than under C 2 , and negative
v i values correspond to the opposite situation.
We then sort
metric
for the same workload described earlier. We can see that at least one query
results in better cost in C 2 than in C 1, even though C 1 looks much better
overall (as was also the case in Figure 12.5a). Although the
v i values and plot the results. Figure 12.5b shows the
I
metric gives ad-
ditional information on a per-query basis, it cannot be used to compare more
I
Search WWH ::




Custom Search