Database Reference
In-Depth Information
equivalence class. The reason is that constants themselves might impact the
choice of physical designs. Consider, for instance, an equivalence class that
contains all queries that share the following template and have specific values
for parameters @v1 and @v2 :
SELECT R.a
FROM R
WHERE R.b < @v1 AND R.c > @v2
ORDER BY R.d
Depending on specific values of @v1 and @v2 , the best index for the result-
ing query could be R
(
b
,
c
,
d
,
a
)
if R.b < @v1 is very selective, R
(
c
,
b
,
d
,
a
)
if
instead R.c < @v2 is very selective, or even R
if both predicates
are not selective. If we pick only a single query from this equivalence class,
we might bias the resulting configuration toward a specific set of indexes. To
address this problem, we can pick a random subset from each equivalence
class. For instance, we can pick max
(
d
,
c
,
b
,
a
)
queries from equivalence
class EC (i.e., a 10% random sample of the equivalence class or five queries,
whichever is larger). In that way, each interesting combination of values in a
partition would probably be represented in the compressed workload, and the
resulting configuration would be close to the optimal one when considering
the full workload.
Note that the previous approach can be seen as an instance of a cluster-
ing problem. The idea is to create clusters of queries in the workload, where
each cluster contains queries that are similar to each other and different from
queries in other clusters, based on a given distance function. We then select
a representative subset of queries from each cluster and thus obtain the com-
pressed workload. The technique discussed previously considers a distance
function between queries defined as follows:
(
5
,
0
.
1
·|
EC
| )
0 f q 1 and q 2 are identical except for constant values
1
distance
(
q 1 ,
q 2 ) =
otherwise
More advanced distance functions take into account domain-specific knowl-
edge but are more expensive to compute. For instance, we can define a distance
function as follows:
q 2 ) = | I(
q 1 ) I(
q 2 ) |
distance
(
q 1 ,
| I(
q 1 ) I(
q 2 ) |
where
is the set of candidate indexes for query q as defined in Section 4.1.
In this way, the more similar the candidate sets of two queries, the closer the
queries in terms of distance. This technique is more sophisticated than the
I(
q
)
Search WWH ::




Custom Search