Database Reference
InDepth Information
clause, or if
c
is updated in an
UPDATE
statement. For instance, consider the
query following:
SELECT R.a, R.b
FROM R, S
WHERE R.x = S.y
ORDER BY R.a
The indexable columns in this case are
R.a
(because it belongs to the orderby
columns) and both
R.x
and
S.y
(because they are part of a valid predicate in
the
WHERE
clause).
Additionally, we need to identify the columns that are not indexable but still
mentioned in the query string and therefore required during query processing.
In the previous query, column
R.b
is not indexable but is required for the
proper evaluation of the query.
Once all relevant columns have been identified, we can obtain the set of
candidates for the query. There are different ways to define this initial can
didate set, depending on the capabilities of the query engine. For instance,
suppose that the query engine can leverage only singlecolumn indexes. In
that case, the candidate set for the query consists of one clustered index and
one nonclustered index for each indexable column. The query would result in
one clustered and one nonclustered index for columns
R.a
,
R.x
, and
S.y
, for
a total of six candidate indexes.
Modern query engines, however, can handle multicolumn indexes. Suppose
that
I
T
is the set of indexable columns of table
T
, and
R
T
is the set of non
indexable columns of table
T
for the input query. The candidate indexes for
table
T
are all of the form
T
(
K

R
1
∪
R
2
)
, where
K
is a permutation of
I
⊆
I
T
,
R
T
}
. That is, for each subset of the indexable
columns
I
T
, we consider all its permutations as key columns, add subsets of
the remaining indexable columns as included columns, and optionally add the
nonindexable columns as additional included columns. For the previous query,
the candidate set is
{
R(x), R(xb), R(xa), R(xa,b), R(a), R(ab),
R(ax), R(ab,x), R(a,x), R(a,xb), R(x,a), R(x,ab), S(y)
}
. Ad
ditionally, the candidate set contains one clustered index for each nonclustered
index in the previous set, with the same key columns.
Clearly the number of such indexes can be very large even for queries of
moderate complexity. We can obtain a smaller candidate set that is very likely
to contain the optimal configuration for the input query if we (1) consider
covering only indexes (i.e., indexes that contain all columns required in the
query), (2) restrict permutations of key columns to those that begin with all
equality predicates in the
WHERE
clause or with all columns in the groupby
or orderby columns, and (3) allow permutations only in which the remaining
key columns are in decreasing order of selectivity. Using these heuristics, the
candidate set for the previous query is
R
1
⊆
(
I
T
−
I
)
, and
R
2
∈{∅
,
{
}
plus the
R(a,xb), R(x,ab), S(y)
corresponding clustered indexes.
Search WWH ::
Custom Search