Database Reference
In-Depth Information
12.4.3 Evaluation Metrics
The metric used to evaluate a given physical design tool is a crucial component
of a benchmark. Usually, the existing literature uses a single number to mea-
sure the quality of recommendations, called
percentage improvement
. Given a
baseline configuration
C
0
, the percentage improvement of a recommended con-
figuration
C
R
for a workload
W
is given by 100
·
(
1
−
))
.
Improvement values can be negative (when the recommended configuration is
less ecient than the initial one due to stricter space constraints), but always
are smaller than 100%. In general, however, a single value like the percent-
age improvement might not provide enough detail to thoroughly evaluate and
compare physical design tuners, and more detailed metrics might be necessary.
We next discuss a metric
M
that shows more detailed information about the
quality of physical design solutions. Consider a workload
W
over a database
D
, and suppose that a tuner recommends configuration
C
for
W
. The quality
of
C
using
cost
(
C
0
,
W
)/
cost
(
C
R
,
W
M
C
,
W
returns, for an input time
t
, the number of queries in
W
that executed faster than
t
:
)
=
|{
q
∈
W
:
cost
(
q
,
C
)
≤
t
}|
M
C
,
W
(
t
|
W
|
where
cost
is the actual execution time of query
q
under configuration
C
.
For pragmatic purposes, sometimes a time-out
T
max
is chosen and
cost
(
q
,
C
)
(
q
,
C
)
is capped by
T
max
. Therefore, it is always the case that
M
C
,
W
(
T
max
)
=
1.
metric for three
different configurations and a 22-query TPC-H workload. We can see in the
figure that 90% of the queries ran in less than 500 seconds under
C
1, where
only 30 and 22% of the queries did the same under either
C
2or
C
3, respec-
tively. The
M
metric can be used to compare multiple tuners simultaneously
and makes possible some amount of goal-oriented evaluation (e.g., 30% of the
queries should execute in subsecond time).
A drawback of the
M
metric is that it does not report per-query compar-
isons because the individual queries are sorted in different orders. It is not
possible, just by looking at Figure 12.5a, to draw conclusions about the per-
formance of specific queries. For instance, although some queries were better
under
C
2 than under
C
1, Figure 12.5a does not show this fact. A comple-
mentary metric, called
Figure 12.5a shows a graphical representation of the
M
, focuses on query-by-query performance. Consider
configurations
C
1
and
C
2
produced by two tuning tools. We then compute,
for each query
q
i
in the workload, the value
I
v
i
=
cost
(
q
i
,
C
1
)
−
cost
(
q
i
,
C
2
)
.
Clearly, positive
v
i
values correspond to queries that were better under
C
1
than under
C
2
, and negative
v
i
values correspond to the opposite situation.
We then sort
metric
for the same workload described earlier. We can see that at least one query
results in better cost in
C
2 than in
C
1, even though
C
1 looks much better
overall (as was also the case in Figure 12.5a). Although the
v
i
values and plot the results. Figure 12.5b shows the
I
metric gives ad-
ditional information on a per-query basis, it cannot be used to compare more
I
Search WWH ::
Custom Search